Genetic locus for everninomicin biosynthesis

ABSTRACT

The present invention relates to isolated genetic sequences encoding proteins which direct the biosynthesis of the antibiotic everninomicin in  Micromonospora carbonacea.  The isolated biosynthetic gene cluster serves as a substrate for bioengineering of antibiotic structures.

CROSS REFERENCE TO RELATED APPLICATION

[0001] This application claims benefit under 35 U.S.C. §119 of provisional application U.S. Ser. No. 60/177,170, filed on Jan. 27, 2000, which is herein incorporated by reference in its entirety for all purposes.

FIELD OF INVENTION

[0002] The present invention relates to the field of antibiotics, specifically those active against gram-positive bacteria and more specifically to genes of the everninomicin biosynthetic pathway of Micromonospora carbonacea. In particular, this invention elucidates the gene cluster controlling the biosynthesis of everninomicin.

BACKGROUND

[0003] Everninomicin is one member of a class of oligosaccharide natural products collectively referred to as the orthosomycins. At least five active components of everninomicin have been obtained by fermentation of M. carbonacea, namely everninomicin A, B, C, D, and E, of which everninomicin D is the principal component (Weinstein et al., Antimicrobial Agents and Chemotherapy—1964, 24-32, 1964; U.S. Pat. No. 3,499,078). Additional everninomicins, including 13-384 component 1 and 13-384 component 5, have been described from other strains of M. carbonacea (Ganguly et al., Heterocycles, 1989, Vol. 28, pp. 83-88; U.S. Pat. Nos. 4,597,968 and 4,735,903). The structure of some of the known everninomicins is described in Encyclopedia of Chemical Technology, 4^(th) edition, volume 3, 1992, pp. 60-261 ed. Mary Howe-Grant, from which the chemical structure of everninomicin, as illustrated in FIG. 2 of the present specification, was derived.

[0004] Everninomicins contain two sensitive orthoester moieties and one or more highly substituted aromatic moiteties. Everninomicins possess many unusual features, including a 1-1′ disaccharide bridge, a nitrosugar (evernitrose), thirteen rings, and thirty five stereogenic centers within its structure (Ganguly A. K. et al., Tetrahedron Lett. 1997, 38, 7989-7991). It has been recognized that everninomicin constitutes a formidable challenge to organic synthesis because of its unusual connectivity and polyfunctional and sensitive nature (Nicolaou, K. C. et al., Angew. Chem. Int. Ed. 1999, 38. No. 22). Moreover, chemical synthesis of everninomicin compounds produces a poor yield of the desired everninomicin molecule due to the presence of the unusual structural features. As an alternative to making structural analogs of microbial metabolites by chemical synthesis, manipulating genes of governing secondary metabolism offer a promising alternative and allow for preparation of these compounds biosynthetically. However, the success of a biosynthetic approach depends critically on the availability of novel genetic systems and on genes encoding novel enzyme activities. Elucidation of the everninomicin gene cluster contributes to the general field of combinatorial biosynthesis by expanding the repertoire of genes uniquely associated with everninomicin biosynthesis, leading to the making of novel everninomicins via combinatorial biosynthesis.

[0005] The emergence of multi-resistant, Gram-positive pathogens gives rise to an urgent need for new antimicrobial agents that display novel mechanisms of actions and demonstrate activity against resistant strains. Everninomicin has demonstrated a wide spectrum of antibacterial activity against gram-positive organisms, including methicillin-resistant Staphylococcus aureus, vancomycin-resistant enterococci, and penicillin-resistant pneumococci. The production of everninomicin is recognized as a valuable source of antibiotics. For example, everninomicin (trade name Ziracin®) was under development by Schering-Plough as an intravenous treatment of severe resistant gram-positive bacterial infections. Consequently, it is desirable to develop cost effective means to produce everninomicin. Elucidation of the everninomicin gene cluster would provide a means to construct everninomicin overproducing strains by de-regulating the biosynthetic machinery.

[0006] It is also desirable to produce chemical modifications of everninomicin to enhance certain properties. For example, everninomicin D presented pharmacokinetic problems when tested in vivo on mice and dogs (Ganguly A. K. et al., J. Antibiotics 35:5 561-570, 1982). Likewise, it has been reported that everninomicins have been unavailable for clinical use due to severe adverse reactions observed in laboratory animals, which reactions include lack of coordination and ataxia (Maertens, Current Opinion in Anti-infective investigational Drugs, 1999 1(1):49-56). Elucidation of the everninomicin gene cluster would provide a means to produce via genetic manipulation or combinatorial biosynthesis modified everninomicin D with improved properties. Elucidation of the gene cluster controlling the biosynthesis of everninomicin would provide access to rational engineering of everninomicin biosynthesis for novel drug leads. Accordingly, there is a need for genetic information regarding the biosynthesis of everninomicin.

SUMMARY OF THE INVENTION

[0007] The invention provides purified and isolated polynucleotide molecules that encode polypeptides of the everninomycin biosynthetic pathway in Micromonospora carbonacea. In one form of the invention, polynucleotide molecules are selected from contiguous DNA sequences of FIG. 1 (SEQ ID NOS: 1, 3, 4, 8, 22, 36, 47 and 49). In another form, the invention provides polypeptides corresponding to the isolated DNA molecules. The amino acid sequences of the corresponding encoded polypeptides are also shown in FIG. 1.

[0008] Structural and functional characterization is provided for the 49 open reading frames (ORFs) comprising this cluster (SEQ ID NOS: 2, 5 to 7, 9 to 21, 23 to 35, 37 to 46, 48, and 50 to 58). Thus, in one embodiment, this invention provides an isolated nucleic acid comprising a nucleic acid selected from the group consisting of a nucleic acid encoding any of everninomicin ORFs 1 to 49 (SEQ ID NOS: 2, 5 to 7, 9 to 21, 23 to 35, 37 to 46, 48, and 50 to 58); a nucleic acid encoding a polypeptide encoded by any of everninomicin ORFs 1 to 49; and a nucleic acid (SEQ ID NOS: 2, 5 to 7, 9 to 21, 23 to 35, 37 to 46, 48, and 50 to 58) which is at least 75% (preferably 80%, more preferably 85% or more) identical in amino acid sequence to a polypeptide encoded by any of everninomicin ORFs 1 to 49. Certain embodiments of the invention specifically exclude one or more of ORFs 1 to 49. In one embodiment, preferred nucleic acids comprise a nucleic acid encoding at least two (more preferably at least three or more, and still more preferably at least 5 or more) ORFs selected from the group consisting of ORF 1 to 49 (SEQ ID NOS: 2, 5 to 7, 9 to 21, 23 to 35, 37 to 46, 48, and 50 to 58).

[0009] Those skilled in the art will readily understand that the invention, having provided the polynucleotide sequences encoding polypeptides of the everninomicin biosynthetic pathway, also provides polynucleotides encoding fragments derived from such peptides. In one embodiment the invention provides an isolated nucleic acid comprising a nucleic acid that specifically hybridizes under stringent conditions to an ORF of the everninomicin biosynthesis gene cluster, and can substitute for the ORF to which it specifically hybridizes to direct the synthesis of an everninomicin. In certain embodiments this also includes nucleic acids that would stringently hybridize but for the degeneracy of the nucleic acid code. In other words, if silent mutations could be made in the subject sequence so that it hybridizes to the indicated sequences under stringent conditions, it would be included in certain embodiments. The invention also provides an isolated gene cluster comprising ORFs encoding polypeptides sufficient to direct the assembly of an everninomicin or an everninomicin analogue.

[0010] Moreover, the invention is understood to provide naturally occurring variants or derivatives of such polypeptides and fragments derived therefrom, such variants or derivatives resulting from the addition, deletion, or substitution of non-essential amino acids or conservative substitutions of essential amino acids as described herein. Particularly preferred nucleic acids comprise a nucleic acid that specifically hybridizes under stringent conditions to a nucleic acid encoding a polypeptide selected from the group consisting of ORF 1, ORF 2, ORF 3, ORF 4, ORF 5, ORF 6, ORF 7, ORF 8, ORF 9, ORF 10, ORF 11, ORF 12, ORF 13, ORF 14, ORF 15, ORF 16, ORF 17, ORF 18, ORF 19, ORF20, ORF 21, ORF 22, ORF 23, ORF 24, ORF 25, ORF 26, ORF 27, ORF 28, ORF 29, ORF 30, ORF 31, ORF 32, ORF 33, ORF 34, ORF 35, ORF 36, ORF 37, ORF 38, ORF 39, ORF 40, ORF 41, ORF 42, ORF 43, ORF 44, ORF 45, ORF 46, ORF 47, ORF 48, and ORF 49 (SEQ ID NOS: 2, 5 to 7, 9 to 21, 23 to 35, 37 to 46, 48, and 50 to 58 respectively). Particularly preferred isolated nucleic acid comprises a nucleic acid encoding a polypeptide selected from the group consisting of ORF 1, ORF 2, ORF 3, ORF 4, ORF 5, ORF 6, ORF 7, ORF 8, ORF 9, ORF 10, ORF 11, ORF 12, ORF 13, ORF 14, ORF 15, ORF 16, ORF 17, ORF 18, ORF 19, ORF20, ORF 21, ORF 22, ORF 23, ORF 24, ORF 25, ORF 26, ORF 27, ORF 28, ORF 29, ORF 30, ORF 31, ORF 32, ORF 33, ORF 34, ORF 35, ORF 36, ORF 37, ORF 38, ORF 39, ORF 40, ORF 41, ORF 42, ORF 43, ORF 44, ORF 45, ORF 46, ORF 47, ORF 48, and ORF 49 (SEQ ID NOS: 2, 5 to 7, 9 to 21, 23 to 35, 37 to 46, 48, and 50 to 58 respectively). The nucleic acid may comprise a nucleic acid that is a single nucleotide polymorphism (SNP) of a nucleic acid encoding a polypeptide selected from the group consisting of ORF 1, ORF 2, ORF 3, ORF 4, ORF 5, ORF 6, ORF 7, ORF 8, ORF 9, ORF 10, ORF 11, ORF 12, ORF 13, ORF 14, ORF 15, ORF 16, ORF 17, ORF 18, ORF 19, ORF20, ORF 21, ORF 22, ORF 23, ORF 24, ORF 25, ORF 26, ORF 27, ORF 28, ORF 29, ORF 30, ORF 31, ORF 32, ORF 33, ORF 34, ORF 35, ORF 36, ORF 37, ORF 38, ORF 39, ORF 40, ORF 41, ORF 42, ORF 43, ORF 44, ORF 45, ORF 46, ORF 47, ORF 48, and ORF 49 (SEQ ID NOS: 2, 5 to 7, 9 to 21, 23 to 35, 37 to 46, 48, and 50 to 58). Certain embodiments of the invention specifically exclude one or more of ORFs 1 to 49.

[0011] This invention also provides for a polypeptide encoded by any one or more of the nucleic acids described herein.

[0012] Those skilled in the art would also readily understand that the invention, having provided the polynucleotide sequences of the entire genetic locus from M. carbonacea, further provides naturally-occurring variants or homologs of the genes of the everninomicin biosynthetic locus from other bacterial of the order Actinomycetes family. It is also understood that the invention, having provided the polynucleotide sequences of the entire genetic locus as well as the coding sequences, further provides polynucleotides which regulate the expression of the polypeptides of the biosynthetic pathway. Such regulating polynucleotides include but are not limited to promoter and enhancer sequences, as well as sequences antisense to any of the aforementioned sequences. The antisense molecules are regulators of gene expression in that they are used to suppress expression of the gene from which they are derived.

[0013] The gene cluster may be present in a host cell, preferably in a bacterial cell. Preferred families of bacterial cells include but are not limited to: a) bacteria of the family Micromonosporaceae, of which preferred genus include Micromonospora, Actinoplanes and Dactylosporangium; b) bacteria of the family Streptomycetaceae, of which preferred genus include Streptomyces, and Kitasatospora; and c) bacteria of the family Pseudonocardiaceae, of which preferred genus are Amycolatopsis, Kibdelosporangium, and Saccharopolyspora. The host cell is transformed with an exogenous nucleic acid comprising a gene cluster encoding polypeptides sufficient to direct the assembly of an everninomicin or an everninomicin analogue. In certain embodiments heterologous nucleic acid may comprise only a portion of the gene cluster, but the cell will still be able to express an everninomicin. Expression cassettes and vectors comprising a polynucleotide as described herein, as well as cells transformed or transfected with such cassettes and vectors, are also within the scope of the invention.

[0014] The invention also provides methods of chemically modifying a biological molecule. The methods involve contacting a biological molecule that is a substrate for a polypeptide encoded by an everninomicin biosynthesis gene cluster ORF, with a polypeptide encoded by an everninomicin biosynthesis gene cluster ORF whereby the polypeptide chemically modifies the biological molecule. In one preferred embodiment, the polypeptide is an enzyme selected from the group consisting of an O-methyltransferase, an integral membrane antiporter, a methyltransferase, a blue copper oxidoreductase, a C-methyltransferase, a nucleotide binding protein, a mannosyltransferase, a sugar epimerase/reductase, an oxygenase, a tRNA/rRNA methylase, a 3-ketoacyl-[ACP]-synthase, a glycosyltransferase, an alpha-ketoglutarate-dependent dioxygenase, a halogenase, a glycosyltransferase, an acetoin dehydrogenase E1 alpha or beta subunit, a rhamnosyltransferase, a sugar dehydratase/epimerase, a sugar nucleotidyltransferase, a sugar 4,6-dehydratase, a sugar epimerase/ketoreductase, an iterative type 1 polyketide synthase, a hydrolase/phosphatase, a glucosyltransferase, a sugar ketoreductase, sugar 2,3-dehydratase, sugar dehydratase, a resistance rRNA methyltransferase, a flavoprotein oxidoreductase, a deoxyhexose aminotransferase, a sugar epimerase, a sugar ketoreductase, an endoglucanase, a transcriptional regulator and a glucokinase. In a preferred embodiment, the method involves contacting the biological molecule with at least two (preferably at least three or more) different polypeptides of everninomicin gene cluster ORFs 1 to 49 (SEQ ID NOS: 2, 5 to 7, 9 to 21, 23 to 35, 37 to 46, 48, and 50 to 58). The contacting may be in a host cell or the contacting can be ex vivo. The biological molecule can be an endogenous metabolite produced by the host cell or an exogenous supplied metabolite. In preferred embodiments, the host cell is a bacterial cell or eukaryotic cell (e.g. a mammalian cell, a yeast cell, a plant cell, a fungal cell, an insect cell etc.). In certain preferred embodiments, the host cell synthesizes deoxyhexose precursors or a dichloroisoeverninic moiety for the biological molecule. In other preferred embodiments, the host cell synthesizes the nitrosugar evernitrose. In one preferred embodiment, the method comprises contacting the biological molecule with substantially all of the polypeptides of ORF 1 to 49 (SEQ ID NOS: 2, 5 to 7, 9 to 21, 23 to 35, 37 to 46, 48, and 50 to 58) and the method produces an everninomicin or everninomicin analogue.

BRIEF DESCRIPTION OF THE DRAWINGS

[0015]FIG. 1 illustrates contiguous nucleotide sequences and deduced amino acid sequences of the everninomicin biosynthetic locus from Micromonospora carbonacea (SEQ ID NOS: 1 to 58).

[0016]FIG. 2 illustrates the structure of some of the known everninomicins.

[0017]FIG. 3 illustrates a biosynthetic scheme for the production of deoxyhexose precursors for everninomicin biosynthesis.

[0018]FIG. 4 illustrates a biosynthetic scheme for the production of nitrosugar evernitrose.

[0019]FIG. 5 illustrates a biosynthetic scheme for the production of the dichloroisoeverninic moiety that is found in the ester linkage to the sugar residue B of everninomicin.

DETAILED DESCRIPTION OF THE INVENTION

[0020] Contiguous nucleotide sequences and deduced amino acid sequences of the everninomicin biosynthetic locus from Micromonospora carbonacea are illustrated in FIG. 1 (SEQ ID NOS: 1 to 58). In particular, FIG. 1 shows a complete gene cluster formed of eight DNA contiguous sequences, which gene cluster regulates the biosynthesis of everninomicin. FIG. 1 further shows the amino acid sequences of the isolated polynucleotide coding regions which encode 49 polypeptides of the everninomicin biosynthetic pathway (SEQ ID NOS: 2, 5 to 7, 9 to 21, 23 to 35, 37 to 46, 48, and 50 to 58).

[0021] The contiguous nucleotide sequences are arranged such that, as found within the everninomicin biosynthetic locus, DNA contig 1 (SEQ ID NO 1) is adjacent to the 5′ end of DNA contig 2 (SEQ ID NO 3), which is in turn adjacent to DNA contig 3 (SEQ ID NO 4), etc. The ORFs represent open reading frames deduced from the nucleotide sequences. ORF 1 (SEQ ID NO 2) has been deduced from DNA contig 1 (SEQ ID NO 1); ORFs 2 to 4 (SEQ ID NOS: 3, 4, and 8) have been deduced from DNA contig 3 (SEQ ID NO 4); ORFs 5 to 17 (SEQ ID NOS: 9 to 21) have been deduced from DNA contig 4 (SEQ ID NO 8); ORFs 18 to 30 (SEQ ID NOS: 23 to 35) have been deduced from DNA contig 5 (SEQ ID NO 22); ORFs 31 to 39 (SEQ ID NOS 37 to 45) and the C-terminus of ORF 40 (SEQ ID NO 46) have been deduced from DNA contig 6 (SEQ ID NO 36); the N-terminus of ORF 40 (SEQ ID NO 48) has been deduced from DNA contig 7 (SEQ ID NO 47); ORFs 41 to 49 (SEQ ID NOS 50 to 58) have been deduced from DNA contig 8 (SEQ ID NO 49). As pointed out in FIG. 1, some of the ORFs are incomplete. In addition, one nucleotide (at position 27 of DNA contig 6, SEQ ID NO 36) remains to be determined. The DNA contig coding regions giving rise to the ORFs are also shown in FIG. 1, along with the orientation of the ORFs, (i.e. whether they are to be read off the positive (sense, coding) strand or the negative (antisense, non-coding strand)).

[0022] A deposit of three strains of E.coli DH10B cells, each harbouring a cosmid clone of the everninomicin locus was made on Jan. 24, 2001 with the International Depositary Authority of Canada (IDAC), 1015 Arlington Street, Winnipeg, Manitoba, R3E 3R2, Canada according to the provisions of the Budapest Treaty. The deposits were assigned accession nos. IDAC 240101-1, IDAC 240101-2 and IDAC 240101-3. All restrictions on the availability to the public of the above IDAC deposits will be irrevocably removed upon the granting of a patent on this application.

[0023] Everninomicin is naturally produced by a number of microorganisms of the order Actinomycetales. Given the potential medical importance of this class of antibiotics, the genetic locus encoding the biosynthetic pathway for everninomicin production was isolated and sequenced from one known producer, Micromonospora carbonacea subspecies aurantiaca (strain number NRRL 2997, obtained from the Agricultural Research Service Culture Collection of the United States Department of Agriculture; everninomicin production by this strain is described in U.S. Pat. No. 3,499,078). The newly discovered locus encodes 49 individual proteins (SEQ ID NOS: 2, 5 to 7, 9 to 21, 23 to 35, 37 to 46, 48, and 50 to 58) involved in the biosynthesis of everninomicin by this organism. The full-length locus and individual cloned genes are useful for a variety of purposes relating to synthesis of antibiotics of the orthosomycin class.

[0024] The entire everninomycin biosynthetic locus spans approximately 60 kb. Analysis of this 60 kb DNA sequence reveals the presence of individual genes encoding 49 individual proteins. Three of the genes show strong homology to the Streptomyces viridochromogenes avilamycin biosynthetic genes aviD, aviE and aviM, previously demonstrated to be involved in the biosynthesis of avilamycin, a member of the orthosomycin class of antibiotics (Gaisser et al., 1997, J. Bacteriol., Vol. 179, pp. 6271-6278). The gene encoding ORF 28 of FIG. 1 (SEQ ID NO 33) is homologous to the aviD gene, the gene encoding ORF 29 of FIG. 1 (SEQ ID NO 34) is homologous to the aviE gene, and the gene encoding ORF 32 of FIG. 1 (SEQ ID NO 38) is homologous to the aviM gene.

[0025] The functions of the 49 individual proteins of the everninomicin biosynthetic locus were assessed by computer comparison of each protein with proteins found in the GenBank database of protein sequences (National Center for Biotechnology Information, National Library of Medicine, Bethesda, Md. USA) using the BLASTP algorithm (Altschul et al., 1997, Nucleic Acids Res. Vol. 25, pp.3389-3402). Significant amino acid sequence homologies and proposed function found for each protein in the everninomicin locus are shown in Table 1. TABLE 1 GenBank % % ORF # aa Proposed function homology probability identity similarity proposed function of GenBank match  1 250 O-methyltransferase AAD41819 5.00E−83 55 71 TylF 3″″-O-methyltransferase in tylosin biosynthetic locus of Streptomyces fradiae BAA03670 3.00E−80 54 71 MycF mycinamicin III O-methyltransferase in the mycinamicin biosynthetic locus of AAG29794 1.00E−79 56 70 Micromonospora griseorubida CumN O-methyltransferase in coumermycin AAF67509 2.00E−79 56 70 A1 biosynthetic locus of Streptomyces rishiriensis NovP O-methyltransferase in the novobiocin biosynthetic locus of Streptomyces spheroides  2 345 integral membrane AAF26906 6.00E−38 31 48 protein similar to Na/H and drug/H antiporters antiporter in epothilone biosynthetic locus of Sorangium cellulosum (partial) CAB45049 2.00E−35 31 54 putative integral membrane ion antiporter in chloroeremomycin biosynthetic locus of Amycolatopsis orientalis BAA16991 6.00E−33 26 49 Synechocystis sp. Na/H antiporter  3 385 methyltransferase BAA79525 6.00E−15 28 41 hypothetical protein in Aeropyrum pemix with homology to N-6 Adenine-specific DNA methylases CAB88946 6.00E−05 31 40 putative methyltransferase in Streptomyces coelicolor  4 480 blue copper CAB12449 1.00E−60 33 44 Bacillus subtilis spore coat protein involved oxidoreductase in brown pigmentation during sporogenesis (partial) BAA02123 6.00E−60 35 49 bilirubin oxidase from Myrothecium verrucaria CAB75422 7.00E−57 34 47 polyphenol oxidase from Acremonium morurum AAA86668 3.00E−35 26 37 PhsA phenoxazinone synthase from Streptomyces Antibioticus  5 274 methyltransferase AAF09939 9.00E−05 53 64 probable methyltransferase, BioC family, from Deinococcus radiodurans AAC01738 7.00E−05 35 45 methyltransferase in rifamycin biosynthetic locus of Amycolatopsis mediterranei CAB93437 3.00E−04 42 70 putative methyltransferase from Streptomyces coelicolor  6 414 C-methyltransferase AAD41823 4.00E−79 43 55 TylCIII NDP-hexose 3-C-methyltransferase in thetylosin biosynthetic locus of Streptomyces fradiae CAA42926 4.00E−72 41 55 protein in the erythromycin biosynthetic locus of Saccharopolyspora erythraea AAG29803 5.00E−46 31 49 CumW C-methyltransferase in the coumermycin A1 biosynthetic locus of Streptomyces rishiriensis AAF01816 1.00E−45 31 47 SnoG protein in the nogalamycin biosynthetic locus of Streptomyces nogalater AAF67514 6.00E−44 30 47 NovU C-methyltransferase in the novobiocin biosynthetic locus of Streptomyces spheroides  7 357 O-methyltransferase AAD12164 3.00E−79 45 59 TylE O-methyltransferase in the tylosin biosynthetic locus of Streptomyces fradiae CAA12021 6.00E−72 45 57 SnogY O-methylase in the nogalamycin biosynthetic locus of Streptomyces nogalater CAA05644 7.00E−52 42 56 OleY protein in the oleandomycin biosynthetic locus of Streptomyces antibioticus  8 292 mannosyltransferase AAB89517 1.00E−05 26 47 galactosyltransferase from Archaeoglobus fulgidus CAB58332 6.00E−05 26 38 putative glycosyl transferase from Streptomyces coelicolor AAF12269 3.00E−04 25 45 mannosyl transferase from Deinococcus radiodurans  9 137 nucleotide-binding AAD45266 3.60E+00 34 42 Pseudomonas aeruginosa WbjC putative protein nucleotide-binding protein involved in O-antigen (sugar) biosynthesis AAB63947 6.20E+00 38 60 Streptococcus pneumoniae SulD bifunctional aldolase-pyrophosphokinase 10 314 sugar epimerase/reductase CAA12010 1.00E−51 42 53 SnogG dTD P-4-keto-6-deoxyhexose reductase in the nogalamycin biosynthetic locus of Streptomyces nogalater AAB63047 4.00E−46 38 52 DnmV thymidine diphospho-4-keto-2,3,6- trideoxyhexulose reductase in the daunorubicin biosynthetic locus of Streptomyces peucetius AAD13561 5.00E−45 39 50 LanZ3 NDP-hexose 4-keto reductase in the landomycin biosynthetic locus of Streptomyces cyanogenus AAF72549 4.00E−43 39 48 UrdZ3 NDP-hexose 4-ketoreductase in the urdamycin biosynthetic of Streptomyces fradiae 11 285 O-methyltransferase BAA32132 2.00E−68 50 61 methyltransferase in Streptomyces griseus AAB00531 2.00E−63 46 59 DmpM O-demethylpuromycin-O-methyltransferase in the puromycin biosynthetic locus of Streptomyces alboniger AAD32742 8.00E−34 34 47 MmcR O-methyltransferase in the mitomycin biosynthetic locus of Streptomyces lavendulae AAA67518 4.00E−32 33 48 TcmN O-methyltransferase in the tetracenomycin biosynthetic locus of Streptomyces glaucescens 12 276 Oxygenase CAA07766 5.00E+00 27 39 MtmOl oxygenase in the mithramycin biosynthetic locus of Streptomyces argillaceus 13 265 tRNA/rRNA methylase AAG32066 3.00E−73 54 70 rRNA methyltransferase AviRb involved in avilamycin A resistance Streptomyces viridochromogenes AAF10591 7.00E−28 36 51 rRNA methylase from Deinococcus radiodurans AAF73591 1.00E−23 31 48 SpoU rRNA methylase family protein from Chlamydia muridarum AAC68000 1.00E−22 30 48 SpoU family rRNA methylase from Chlamydia Trachomatis AAD18670 2.00E−22 27 48 SpoU-1 rRNA methylase fromChlamydophila pneumoniae 14 344 3-ketoacyl-[ACP]-synthase AAG29787 2.00E−76 43 58 CumJ 3-ketoacyl-[ACP]-synthase in the coumermycin A1 biosynthetic locus of Streptomyces rishiriensis AAA65208 2.00E−61 38 54 DpsC daunorubicin-doxorubicin polyketide synthase from Streptomyces peucetius CAB71914 3.00E−70 40 58 beta-keto acyl synthase III homolog form Streptomyces coelicolor AAF70109 5.00E−54 37 50 AknE2 ketoacyl synthase involved in aclacinomycin biosynthesis in Streptomyces galilaeus 15 240 methyltransferase CAA70016 5.00E−04 33 41 StsG methyltransferase involved in N-methyl-L- glucosamine pathway in streptomycin biosynthetic locus of Streptomyces griseus AAG06559 2.00E−03 24 41 UbiG 3-demethylubiquinone-9 3-methyltransferase from Pseudomonas aeruginosa AAF09618 5.00E−03 27 47 putative methyltransferase from Deinococcus radiodurans AAD28458 1.50E−02 27 43 MitN methyltransferase in the mitomycin biosynthetic locus of Streptomyces lavendulae 16 380 glycosyltransferase AAF00209 5.00E−80 44 58 UrdGT2 glycosyl transferase in the urdamycin A biosynthetic locus of Streptomyces fradiae AAD13553 7.00E−78 43 59 LanGT2 glycosyl transferase in the landomycin biosynthetic locus of Streptomyces cyanogenus CAA09635 8.00E−70 42 55 Gra-orf14 putative glycosyl transferase in the granaticin biosynthetic locus of Streptomyces violaceoruber AAC01731 3.00E−58 37 51 dNTP-hexose glycosyl transferase in the rifamycin biosynthetic locus of Amycolatopsis mediterranei 17 405 unknown none 18 296* alpha-ketoglutarate- AAC71711 0.005 27 42 HtxA putative alpha-ketoglutarate-dependent (partial) dependent Hypophosphite dioxygenase from dioxygenase Pseudomonas stutzeri 19 243 methyltransferase JC5319 9.90E−02 43 61 TlrD macrolide-lincosamide-streptogramin B resistance determinant from Streptomyces fradiae CAB45043 2.20E−01 36 49 putative rRNA methylase from Amycolatopsis orientalis AAF86398 3.80E−01 26 35 FkbM 31-O-methyltransferase in the FK520 biosynthetic locus of Streptomyces hygroscopicus var. ascomyceticus AAC44360 3.80E−01 30 40 FkbM 31-O-demethyl-FK506 methyltransferase in the FK506 biosynthetic locus of Streptomyces sp. 20 482 halogenase CAA11780 6.00E−60 32 50 protein similar to non-heme oxygenase/halogenase in chloroeremomycin biosynthetic locus of Amycolatopsis orientalis CAA76550 5.00E−59 32 49 OxyD putative halogenase in the balhimycin biosynthetic locus of Amycolatopsis mediterranei AAG38844 2.00E−34 31 47 putative reductase/halogenase in the xanthomonadin biosynthetic locus of Xanthomonas oryzae AAD24884 7.00E−29 27 43 PltA putative halogenase in the pyoluteorin biosynthetic locus of Pseudomonas fluorescens 21 438 glycosyltransferase AAC64928 2.00E−44 32 44 MtmGI glycosyltransferase involved in mithramycin biosynthesis in Streptomyces argillaceus AAD55583 2.00E−43 32 46 MtmGIII glycosyltransferase involved in mithramycin biosynthesis in Streptomyces argillaceus AF077869 2.00E−41 32 44 MtmGIV glycosyltransferase involved in mithramycin biosynthesis in Streptomyces argillaceus AAC68677 3.00E−34 28 42 DesVII glycosyl transferase in the methymycin/pikromycin biosynthetic locus of Streptomyces venezuelae 22 325 acetoin dehydrogenase AAG07537 8.00E−71 48 60 probable dehydrogenase E1 component from E1 alpha subunit Pseudomonas aeruginosa AAA21744 8.00E−69 46 61 TPP-dependent acetoin dehydrogenase E1 alpha- subunit from Clostridium magnum AAA21948 3.00E−65 46 57 Acetoin:DCPIP oxidoreductase-alpha from Ralstonia eutropha 23 320 acetoin dehydrogenase AAA18916 2.00E−53 38 55 Acetoin:DCPIP oxidoreductase beta subunit from E1 beta subunit Pelobacter carbinolicus AAG07538 8.00E−53 40 54 Acetoin catabolism protein AcoB from Pseudomonas aeruginosa AAA21745 6.00E−52 37 57 TPP-dependent acetoin dehydrogenase beta-subunit from Clostridium magnum 24 337 Rhamnosyltransferase CAB50099 2.00E−18 31 48 rhamnosyl transferase related protein from Pyrococcus abyssi AAF04375 5.00E−18 29 42 WbbL dTDP-Rha:a-D-GlcNAc-diphosphoryl polyprenol a-3-L-rhamnosyl transferase from Mycobacterium smegmatis AAF12271 3.00E−16 27 45 putative rhamnosyltransferase from Deinococcus radiodurans AAB66522 2.00E−15 24 44 putative rhamnosyl transferase involved in capsular polysaccharide biosynthesis in Streptococcus pneumoniae 25 350 unknown None 26 252 alpha-ketoglutarate- AAF01812 1.00E−12 28 41 SnoK protein in the nogalamycin biosynthetic locus of dependent dioxygenase Streptomyces nogalater AAC71711 3.00E−11 23 42 HtxA putative alpha-ketoglutarate-dependent hypophosphite dioxygenase from Pseudomonas stutzeri AAB81835 3.00E−06 23 35 peroxisomal phytanoyl-CoA alpha-hydroxylase from Mus musculus AAF15971 2.00E−05 23 38 2-oxoglutarate dependent peroxisomal phytanoyl-CoA hydroxylase (dioxygenase) from Rattus norvegicus 27 309 sugar dehydratase/ AAG08838 4.00E−46 38 53 Gmd GDP-mannose 4,6-dehydratase from epimerase Pseudomonas aeruginosa AAC38668 7.00E−46 37 51 LpsA putative GDP-mannose-4,6-dehydratase predicted to be involved in S-layer lipopolysaccharide biosynthesis in Caulobacter crescentus AAC44117 6.00E−44 37 51 Gca GDP-D-mannose dehydratase involved in common antigen biosynthesis in Pseudomonas aeruginosa AAB84839 7.00E−43 34 50 GDP-D-mannose dehydratase in Methanothermobacter thermoautotrophicus AAD20373 2.00E−42 36 50 MdhtA GDP-D-mannose-dehydratase found in glycopeptolipid biosynthetic locus of Mycobacterium avium 28 355 Sugar P08075 1.00E−126 61 77 StrD glucose-1-phosphate thymidylyltransferase found nucleotidyltransferase in the streptomycin biosynthetic locus in Streptomyces griseus T30872 1.00E−125 60 78 AviD dNDP-glucose synthase in the avilamycin biosynthetic locus of Streptomyces viridochromogenes AAD28517 1.00E−124 59 77 BlmD streptomycin strD protein homolog in the bluensomycin biosynthetic locus of Streptomyces bluensis T48866 1.00E−123 60 77 MtmD glucose-1-phosphate thymidylyltransferase in the mithramycin biosynthetic locus of Streptomyces argillaceus 29 329 sugar 4,6-dehydratase T30873 1.00E−139 74 82 AviE dNDP-glucose dehydratase in the avilamycin biosynthetic locus of Streptomyces viridochromogenes AAG18457 1.00E−123 66 75 AprE dTDP-glucose 4,6-dehydratase from Streptomyces tenebrarius AAA68211 1.00E−123 66 75 TDP-D-glucose-4,6-dehydratase in the erythromycin biosynthetic locus of Saccharopolyspora erythraea BAA84593 1.00E−115 63 76 AveBII dTDP-glucose 4,6-dehydratase in the avermectin biosynthetic locus of Streptomyces avermitilis AAC68681 1.00E−114 62 74 DesIV TDP-glucose-4,6-dehydratase in the methymycin/pikromycin biosynthetic locus of Streptomyces venezuelae 30 342 sugar epimerase/ AAD35594 6.00E−43 38 53 UDP-glucose 4-epimerase from Thermotoga maritima ketoreductase AAG07455 3.00E−37 37 51 probable epimerase from Pseudomonas aeruginosa A71183 2.00E−34 33 46 probable UDP-glucose 4-epimerase from Pyrococcus horikoshii CAB49227 1.00E−33 33 46 GalE-1 UDP-glucose 4-epimerase from Pyrococcus abyssi 31 354 alpha-ketoglutarate- AAF01812 1.00E−10 26 41 Snok protein in the nogalamycin biosynthetic locus of dependent dioxygenase Streptomyces nogalater AAB81835 3.00E−07 29 43 peroxisomal phytanoyl-CoA alpha-hydroxylase from Mus musculus AAC71711 4.00E−06 25 41 HtxA putative alpha-ketoglutarate-dependent hypophosphite dioxygenase from Pseudomonas stutzeri 32 1267 iterative type I CAA72713 0.00E+00 65 75 AviM orsellinic acid synthase in the avilamycin polyketide synthase biosynthetic locus of Streptomyces viridochromogenes BAA20102 0.00E+00 40 56 6-methylsalicylic acid synthase from Aspergillus terreus S13178 0.00E+00 41 55 6-methylsalicylic acid synthase from Penicillium griseofulvum 33 303 hydrolase/ AAF09992 1.00E−05 31 43 hydrolase of the CbbY/CbbZ/GpH/YieH family from phosphatase Deinococcus radiodurans AAG19324 1.00E−05 32 46 p-nitrophenyl phosphatase from Halobacterium sp. AAC76410 4.00E−03 33 53 phosphoglycolate phosphatase from Escherichia coli 34 307 sugar epimerase/ AAD45554 2.00E−52 43 55 Spcl putative dNDP-glucose-4,6-dehydratase in the ketoreductase spectinomycin biosynthetic locus of Streptomyces flavopersicus CAA18814 1.00E−23 32 43 putative sugar dehydratase from Mycobacterium leprae AAD35594 2.00E−23 28 44 UDP-glucose 4-epimerase from Thermotoga maritima BAA84595 2.00E−17 30 42 AviBIV dTDP-4-keto-6-deoxy-L-hexose 4-reductase in the avermectin biosynthetic locus of Streptomyces avermitilis 35 295 glycosyltransferase S37028 6.00E−05 28 42 ExoM rhizobium succinoglycan biosynthesis glycosyltransferase from Sinorhizobium meliloti AAB90621 2.20E−01 25 42 ExoM succinoglycan biosynthesis protein from Archaeoglobus fulgidus 36 341 sugar ketoreductase AAF73453 6.00E−91 55 69 AknQ putative 3-ketoreductase in the Streptomyces galilaeus aclacinomycin biosynthetic locus AAD13550 2.00E−87 53 65 LanT oxidoreductase homolog found in the landomycin biosynthetic locus of Streptomyces cyanogenus AAA83425 3.00E−85 48 64 RdmF oxidoreductase of Streptomyces purpurascens AAF59931 4.00E−82 50 65 dTDP-3,4-diketo-2,6-dideoxyglucose 3-ketoreductase involved in the 2-deoxygenation step in dTDP-L- oleandrose biosynthesis 37 470 sugar 2,3-dehydratase AAD55451 1.00E−127 52 64 OleV involved in the C-2 deoxygenation step in dTDP-L-oleandrose biosynthesis in Streptomyces antibioticus CAB96551 1.00E−122 52 63 MtmV D-olivose, D-oliose and D-mycarose 2,3- dehydratase in the mithramycin biosynthetic locus of Streptomyces argillaceus T46668 1.00E−119 51 64 SnogH probable 2,3-dehydratase in the nogalamycin biosynthetic locus of Streptomyces nogalater AAD13549 1.00E−118 50 63 LanS NDP-hexose 2,3-dehydratase homolog in the landomycin biosynthetic locus of Streptomyces cyanogenus 38 346 sugar dehydratase AAF71765 1.00E−120 63 77 NysDIII putative dGDP-mannose-4,6-dehydratase in the nystatin biosynthetic locus of Streptomyces noursei AAG35360 4.00E−96 55 71 Gmd GDP-mannose 4,6-dehydratase from Aneurinibacillus thermoaerophilus AAD10232 5.00E−93 52 69 putative GDP-D-mannose dehydratase from Anabaena sp. AAC44117 3.00E−89 50 68 Gca GDP-D-mannose dehydratase involved in common antigen biosynthesis in Pseudomonas aeruginosa AAC38668 2.00E−88 49 67 LpsA putative GDP-mannose-4,6-dehydratase predicted to be involved in S-layer lipopolysaccharide biosynthesis in Caulobacter crescentus AAF07199 3.00E−87 49 66 Gmd1 GDP-D-mannose 4,6-dehydratase from Arabidopsis thaliana 39 277 resistance rRNA AAG32067 2.00E−62 52 65 AviRa rRNA methyltransferase involved in avilamycin methyltransferase A resistance in Streptomyces viridochromogenes 40 159* sugar epimerase/ AAD35594 2.00E−31 43 63 UDP-glucose 4-epimerase from Thermotoga maritima ketoreductase 49* C70562 2.00E−29 45 59 robable dTDP-glucose 4-epimerase from Mycobacterium tuberculosis (partial) AAB98196 4.00E−28 43 61 GalE UDP-glucose 4-epimerase from Methanococcus jannaschii CAA18814 2.00E−27 43 57 putative sugar dehyratase from Mycobacterium leprae 41 400 flavoprotein CAA51670 1.00E−108 55 68 ORF3 flavoprotein in the daunorubicin biosynthetic oxidoreductase locus of Streptomyces griseus AAB63045 4.00E−56 39 47 DnmZ putative flavoprotein required for biosynthesis of the daunorubicin precursor thymidine diphospho-L- daunosamine in Streptomyces peucetius 42 373 deoxyhexose CAA11782 1.00E−157 73 82 PCZA361.5 sugar biosynthesis gene in the aminotransferase chloroeremomycin biosynthetic locus of Amycolatopsis orientalis AAG13910 1.00E−151 70 83 MegDII TDP-3-keto-6-deoxyhexose 3- aminotransaminase in the megalomicin biosynthetic locus of Micromonospora megalomicea AAF73462 1.00E−145 74 81 AknZ putative aminotransferase in the aclacinomycin biosynthetic locus of Streptomyces galilaeus AAF01821 1.00E−143 73 81 Snogl putative aminotransferase in the nogalamycin biosynthetic locus of Streptomyces nogalater 43 416 C-methyltransferase CAA11777 1.00E−159 67 79 PCZA361.22 sugar biosynthesis gene in the chloroeremomycin biosynthetic locus of Amycolatopsis orientalis AAC38444 1.00E−152 66 77 DnrX daunorubicin/doxorubicin biosynthesis enzyme from Streptomyces peucetius CAB96549 2.00E−66 37 51 MtmC D-mycarose 3-C-methyltransferase in the mithramycin biosynthetic locus of Streptomyces argillaceus AAG29803 7.00E−62 34 50 CumW C-methyltransferase in the coumermycin A1 biosynthetic locus of Streptomyces rishiriensis 44 207 sugar epimerase AAB63046 7.00E−68 63 75 DnmU putative epimerase involved in the biosynthesis of daunorubicin precursor TDP-L-daunosamine in Streptomyces peucetius AAF70101 2.00E−64 60 73 AknL dTDP-4-keto-6-deoxyhexose 3,5-epimerase in the aclacinomycin biosynthetic locus of Streptomyces galilaeus CAA11781 8.00E−64 58 72 Protein similar to epimerase in the chloroeremomycin biosynthetic locus of Amycolatopsis orientalis CAA12011 1.00E−60 60 72 SnogF 3,5-epimerase in the nogalamycin biosynthetic locus of Streptomyces nogalater 45 343 sugar ketoreductase AAG13913 3.00E−86 54 64 MegDV TDP-4-keto-6-deoxyhexose 4-ketoreductase in the megalomicin biosynthetic locus of Micromonospora megalomicea CAA11764 2.00E−84 51 71 protein similar to dTDP-dehydrogenase in the chloroeremomycin biosynthetic locus of Amycolatopsis orientalis BAA84595 1.00E−79 53 63 AveBlVdTDP-4-keto-6-deoxy-L-hexose 4-reductase in the avermectin biosynthetic locus of Streptomyces avermitilis AAB84071 3.00E−73 48 63 EryBIV oxidoreductase involved in L-mycarose biosynthesis in the erythromycin biosynthetic locus of Saccharopolyspora erythraea 46 306 unknown None 47 518 endoglucanase AAA23084 2.00E−45 52 63 endoglucanase from Cellulomonas fimi CAC16970 4.00E−41 35 47 putative secreted endoglucanase from Streptomyces coeticolor AAA62211 5.00E−36 50 62 beta-1,4-exocellulase precursor from Thermobifida fusca 48 286 transcriptional CAB61919 2.00E−56 45 58 putative lacl-family transcriptional regulator regulator in Streptomyces coelicolor CAA20609 8.00E−56 46 59 putative lacl-family transcriptional regulator in Streptomyces coelicolor CAB65654 2.00E−28 28 48 putative repressor of maltose transport genes in Alicyclobacillus acidocaldarius AAD51826 4.00E−28 34 49 ThuR member of the Lacl-GalR family regulatory proteins in Sinorhizobium meliloti 49 340 glucokinase CAB95296 4.00E−29 34 48 probable sugar kinase from Streptomyces coelicolor CAB65576 6.00E−28 37 44 putative transcriptional regulatory protein with similarity to glucokinase in Streptomyces coelicolor BAB05144 2.00E−27 31 47 glucose kinase from Bacillus halodurans AAD36537 9.00E−26 29 45 glucokinase from Thermotoga maritima

[0026] The everninomicin backbone is composed of eight saccharide residues joined by glycosidic and orthoester linkages. Many of the proteins encoded by the everninomicin locus are likely to be involved in the biosynthesis of the sugar precursors and their subsequent joining and modification.

[0027] Five of the eight saccharide residues of everninomicin (residues A-E of FIG. 2) are deoxyhexoses and are likely to be derived from D-glucose-6-phosphate. Deoxyhexoses are common constituents of microbial secondary metabolites. The first two steps in the biosynthesis of many deoxysugars are the synthesis of dNDP-D-glucose and its conversion to dNDP-4-keto-6-deoxyglucose, catalyzed respectively by dNDP-glucose synthases and dNDP-glucose dehydratases (Liu and Thorson, 1994, Annu. Rev. Microbiol., Vol. 48, pp. 223-256). ORF 28 (SEQ ID NO 33) is similar to many bacterial dNDP-glucose synthases while ORF 29 (SEQ ID 34) is similar to many bacterial dNDP-glucose dehydratases. These two proteins are likely to be involved in generating 6-deoxyhexose precursors for incorporation into everninomicin. Sugar residues at positions A-C, and occasionally D, also lack C-2 hydroxyl groups (see FIG. 2). ORFs 36 and 37 (SEQ IS NOS 42 and 43) encode proteins that are similar to bacterial proteins known to be involved in C-2 deoxygenation and are therefore likely to be involved in the generation of 2,6-dideoxyhexose precursors. ORFs 10, 27, 30, 34, 38 and 40 (SEQ ID NOS 14, 32, 35, 40, 44, and 46) are similar to bacterial proteins that catalyze dehydration, epimerization and/or ketoreduction of deoxyhexose precursors and are likely to catalyze 4-ketoreduction to generate sugars with the appropriate C-4 stereochemistry for everninomicin biosynthesis. A biosynthetic scheme for the production of deoxyhexose precursors for everninomicin biosynthesis is shown in FIG. 3.

[0028] The everninomicins are distinguished from other orthosomycin antibiotics by the presence of a nitrogen-containing sugar residue (residue A of FIG. 2). ORFs 41-45 (SEQ ID NOS 50 to 54) constitute a cluster of ORFs with strong similarity to proteins involved in the biosynthesis of aminodeoxyhexoses. In particular, these ORFs are similar to proteins proposed to catalyze the synthesis of the 3-amino-3-methyl-2,3,6-trideoxyhexose residue of chloroeremomycin (van Wageningen et al., 1998, Chem. & Biol., Vol. 5, pp. 155-162) and proteins involved in the synthesis of the 3-amino-2,3,6-trideoxyhexose residue of daunorubicin (Olano et al., 1999, Chem. & Biol., Vol. 6, pp. 845-855). ORFs 41-45 (SEQ ID NOS 50 to 54) are therefore likely to catalyze the biosynthesis of a 3-amino-3-methyl-2,3,6-trideoxyhexose intermediate that would subsequently be modified by O-methyl transfer and amino group oxidation to yield the evernitrose nitrosugar residue. Two proteins (ORFs 1, 7; SEQ ID NOS 2 and 11) found in the everninomicin locus are similar to bacterial proteins that catalyze O-methyl transfer to deoxyhexoses groups of secondary metabolites and may catalyze O-methyl transfer in evernitrose biosynthesis. ORF 4 (SEQ ID NO 7) encodes an unusual oxidoreductase that shows similarity to bacterial blue-copper oxidoreductases involved in oxidizing nitrogen-containing compounds and as such provides a likely candidate for the amine oxidase required for the biosynthesis of evernitrose. A scheme for the biosynthesis of the nitrosugar evernitrose is shown in FIG. 4.

[0029] Five proteins (ORFs 8, 16, 21, 24 and 35; SEQ ID NOS 12, 20, 26, 29, and 41) are similar to bacterial glycosyltransferases and are therefore likely to catalyze the joining of saccharide precursors via glycosidic linkages to form the backbone oligosaccharide structure that is characteristic of the orthosomycins. Among the glycosyltransferases encoded by the everninomicin locus, one (ORF16; SEQ ID NO 20) shows the greatest similarity to enzymes known to catalyze the transfer of aminodeoxyhexose residues. This glycosyltransferase is therefore likely to catalyze the incorporation of the aminodeoxyhexose precursor that is subsequently converted to the nitrosugar evernitrose. The protein encoded by ORF 35 is the most unusual of the glycosyltransferases and is therefore likely to perform the unusual C-1 to C-1′ linkage that is characteristic of the orthosomycins.

[0030] The everninomicins may contain as many as 7 O-methyl groups (see FIG. 2). It is significant then that the everninomicin locus encodes seven proteins (ORFs 1, 3, 5, 7, 11, 15 and 19; SEQ ID NOS 2, 6, 9, 11, 19, and 24) that show similarity to O-methyltransferases. It is likely that each of these proteins catalyzes a specific O-methylation reaction during the course of everninomicin biosynthesis. ORFs 1 and 7 (SEQ ID NOS 2 and 11) are discussed above as possible enzymes responsible for methylating the C-4 hydroxyl group of the nitrosugar evernitrose. ORF 11 (SEQ ID NO 15) is discussed in more detail below and is likely to catalyze methylation of the phenolic hydroxyl group found on the dichloroisoeverninic acid moiety.

[0031] Four proteins encoded by the everninomicin locus (ORFs 12, 18, 26 and 31; SEQ ID NOS 16, 23, 32 and 37) are similar to oxidoreductases and are likely to catalyze the unusual oxidative modifications of the oligosaccharide backbone that are typical of the orthosomycins. In particular, three of these oxidoreductases (ORFs 18, 26 and 31; SEQ IS NOS 23, 31 and 37) show significant similarity to alpha-ketoglutarate-dependent dioxygenases and may therefore be involved in generating the three orthoester/diether linkages found in all orthosomycins (the orthoester linkages between sugar rings C-D and rings G-H, and the aliphatic methylene dioxy group appended to ring H, as shown in FIG. 2).

[0032] Two proteins in the everninomicin locus (ORFs 6, 43; SEQ ID NOS 10 and 52) are similar to C-methyltransferases that transfer methyl groups to deoxyhexose residues, thus accounting for the source of the two deoxyhexose C-methyl groups found in everninomicin (see FIG. 2). ORF 43 (SEQ ID NO 52) forms part of the aminodeoxyhexose gene cluster discussed earlier and is likely to be responsible for incorporating the C-3 methyl group of the evernitrose residue. ORF 6 (SEQ ID NO 10) is thus the likely source of the only remaining C-methyl group of everninomicin, that found on C-3 of the deoxyhexose residue D.

[0033] Four proteins encoded by the everninomicin locus (ORFs 11, 14, 20 and 32; SEQ ID NOS 15, 18, and 25) are likely to be involved in the biosynthesis of the dichloroisoeverninic moiety that is found in ester linkage to the sugar residue B of everninomicin (see FIG. 2). ORF 32 (SEQ ID NO 38) encodes a type I polyketide synthase that is similar to fungal 6-methylsalicylic acid synthases and to the AviM orsellinic acid synthase involved in avilamycin biosynthesis in Streptomyces viridochromogenes (Gaisser et al., 1997, J. Bacteriol., Vol. 179, pp. 6271-6278). ORF 32 (SEQ ID NO 38) is proposed to catalyze successive rounds of condensation of acyl-CoA precursors to form orsellinic acid, an aromatic precursor to isoeverninic acid. ORF 14 encodes a protein that is similar to 3-ketoacyl-[ACP]-synthases, including the DpsC protein in the daunorubicin biosynthetic locus of Streptomyces sp. strain C5. The DpsC protein has been proposed to interact with polyketide synthases and to confer specificity for the proper acyl-CoA starter unit (Rajgarhia et al., 1997, J. Bacteriol., Vol. 179, pp. 2690-2696). Similarly, the ORF 14 protein may interact with the ORF 32 (SEQ ID NO 38) polyketide synthase during the synthesis of the orsellinic acid precursor. ORF 11 (SEQ ID NO 15) encodes an O-methyltransferase that shows greatest similarity to bacterial proteins that transfer methyl groups to phenolic hydroxyls, and is therefore likely to catalyze the conversion of orsellinic acid to isoeverninic acid. ORF 20 (SEQ ID NO 25) encodes a protein that is similar to many bacterial non-heme halogenases, and is likely to catalyze the addition of 2 chlorine atoms to isoeverninic acid to form dichloroisoeverninic acid. A scheme for the biosynthesis of the dichioroisoeverninic acid moiety is shown in FIG. 5.

[0034] Three proteins encoded by the everninomicin locus (ORFs 22, 23 and 33; SEQ ID NOS 27, 28 and 39) are similar to enzymes involved in carbohydrate metabolism and may serve to generate short chain aliphatic alcohol precursors that are subsequently used to modify the variable positions on C-52 of residue H (see FIG. 2). ORFs 22 and 23 (SEQ ID NOS 27 and 28) are similar to subunits of the acetoin dehydrogenase component E1 involved in the catabolism of acetoin (3-hydroxy-2-butanone), while ORF 33 (SEQ ID NO 39) shows some similarity to bacterial phosphoglycolate phosphatases involved in glycolate (hydroxyacetic acid) metabolism.

[0035] Four proteins encoded by the everninomicin locus (ORFs 2, 13, 39 and 47; SEQ ID NOS 5, 17, 45 and 56)) are likely to be involved in conferring resistance to everninomicin and/or transporting everninomicin out of the producing bacterial cell. Everninomicin inhibits bacterial protein synthesis, and thus exerts its antibacterial effect, by binding to a specific site on the bacterial 50S ribosomal subunit (McNicholas et al., 2000, Antimicrob. Agents Chemother., Vol. 44, pp. 1121-1126). ORFs 13 and 39 (SEQ ID NOS 17 and 45) encode proteins that are similar to ribosomal RNA methyltransferases and are therefore likely to confer resistance to everninomicin (or its intermediates) by modifying the ribosomes of the producing microorganism. ORF 47 (SEQ ID NO 56) encodes a protein with similarity to a number of bacterial endoglucanases, enzymes that catalyze the hydrolysis of internal beta-1,4-glycosidic linkages. The ORF 47 (SEQ ID NO 56) enzyme may confer resistance to everninomicin or its intermediates by cleaving the beta-1,4-endoglycosidic linkage that is found in the oligosaccharide backbone of all orthosomycins. ORF 2 (SEQ ID NO 5) encodes a protein that is similar to integral membrane antiporters associated with antibiotic biosynthesis in other bacteria and is therefore likely to be involved in transport of everninomicin or its intermediates across the bacterial cell membrane.

[0036] Two proteins encoded by the everninomicin locus (ORFs 48, 49; SEQ ID NOS 57 and 58) are likely to be involved in regulating the expression of one or more of the genes in the locus. The orthosomycins are composed of repeating saccharide units and the biosynthesis of these molecules may be sensitive to the availability of saccharide precursors from primary cellular metabolism. ORF 48 (SEQ ID NO 57) encodes a protein that is similar to Lacl family transcriptional repressors that contain sugar binding sites and regulate transcription in response to the presence of small molecules such as saccharides. The ORF 49 (SEQ ID NO 58) protein is similar to glucose kinase and to ROK family transcriptional regulators that have glucose kinase homology. This protein may act as a sensor of hexose levels in the cell and interact with the ORF 48 (SEQ ID NO 57) transcriptional regulator in order to activate expression of one or more genes in the everninomicin locus in response to the availability of saccharide precursors.

[0037] Four proteins encoded by the everninomicin locus (ORFs 9, 17, 25 and 46; SEQ ID NOS 13, 21, 30 and 55) cannot be assigned a putative role in the biosynthesis of everninomicin. ORFs 17, 25 and 46 (SEQ ID NOS 21, 30 and 55) show no significant similarity to proteins in the GenBank database, while the ORF 9 (SEQ ID NO 13) protein shows weak similarity to putative nucleotide-binding proteins involved in sugar biosynthesis.

[0038] Polynucleotide and Amino Acid Sequences:

[0039] The term “isolated polynucleotide” is defined as a polynucleotide removed from the environment in which it naturally occurs. For example, a naturally-occurring DNA molecule present in the genome of a living bacteria is not isolated, but the same molecule separated from the remaining part of the bacterial genome, as a result of, e.g., a cloning event (amplification), is isolated. Typically, an isolated DNA molecule is free from its natural chromosomal context. Such isolated polynucleotides may be part of a vector or a composition and still be defined as isolated in that such a vector or composition is not part of the natural environment of such polynucleotide.

[0040] The polynucleotide of the invention is either RNA or DNA (cDNA, genomic DNA, or synthetic DNA), or modifications, variants, homologs or fragments thereof. The DNA is either double-stranded or single-stranded, and, if single-stranded, is either the coding strand or the non-coding (anti-sense) strand. Any one of the polynucleotide sequences of the invention as shown in FIG. 1 is (a) a coding sequence; (b) a ribonucleotide sequence derived from transcription of (a); (c) a coding sequence which uses the redundancy or degeneracy of the genetic code to encode the same polypeptides; or (d) a regulatory sequence. By “polypeptide” or “protein” is meant any chain of amino acids, regardless of length or post-translational modification (e.g., proteolytic processing or phosphorylation). Both terms are used interchangeably in the present application.

[0041] Consistent with this aspect of the invention, amino acid sequences are provided which are homologous to any one of the amino acid sequences of FIG. 1. As used herein, “homologous amino acid sequence” is any polypeptide which is encoded, in whole or in part, by a nucleic acid sequence which hybridizes at 25-35° C. below critical melting temperature (Tm), to any portion of the coding region nucleic acid sequences of FIG. 1. A homologous amino acid sequence is one that differs from an amino acid sequence shown in FIG. 1 by one or more conservative amino acid substitutions. Such a sequence also encompasses allelic variants (defined below) as well as sequences containing deletions or insertions which retain the functional characteristics of the polypeptide. Preferably, such a sequence is at least 75%, more preferably 80%, and most preferably 90% identical to any amino acid sequence shown in FIG. 1.

[0042] Homologous amino acid sequences include sequences that are identical or substantially identical to the amino acid sequences of FIG. 1. By “amino acid sequence substantially identical” is meant a sequence that is at least 90%, preferably 95%, more preferably 97%, and most preferably 99% identical to an amino acid sequence of reference and that preferably differs from the sequence of reference by a majority of conservative amino acid substitutions.

[0043] Conservative amino acid substitutions are substitutions among amino acids of the same class. These classes include, for example, amino acids having uncharged polar side chains, such as asparagine, glutamine, serine, threonine, and tyrosine; amino acids having basic side chains, such as lysine, arginine, and histidine; amino acids having acidic side chains, such as aspartic acid and glutamic acid; and amino acids having nonpolar side chains, such as glycine, alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan, and cysteine.

[0044] Homology is measured using sequence analysis software such as Sequence Analysis Software Package of the Genetics Computer Group, University of Wisconsin Biotechnology Center, 1710 University Avenue, Madison, Wis. 53705. Amino acid sequences are aligned to maximize identity. Gaps may be artificially introduced into the sequence to attain proper alignment. Once the optimal alignment has been set up, the degree of homology is established by recording all of the positions in which the amino acids of both sequences are identical, relative to the total number of positions.

[0045] Homologous polynucleotide sequences are defined in a similar way. Preferably, a homologous sequence is one that is at least 45%, more preferably 60%, and most preferably 85% identical to any one of the coding sequences of FIG. 1.

[0046] Consistent with this aspect of the invention, polypeptides having a sequence homologous to any one of the amino acid sequences of FIG. 1 include naturally-occurring allelic variants, as well as mutants or any other non-naturally occurring variants that retain the inherent characteristics of any polypeptide of FIG. 1.

[0047] As is known in the art, an allelic variant is an alternate form of a polypeptide that is characterized as having a substitution, deletion, or addition of one or more amino acids that does not alter the biological function of the polypeptide. By “biological function” is meant the function of the polypeptide in the cells in which it naturally occurs. A polypeptide can have more than one biological function.

[0048] Also consistent with this aspect of the invention is a substantially purified polypeptide or polypeptide derivative having an amino acid sequence encoded by a polynucleotide of the invention. A “substantially purified polypeptide” as used herein is defined as a polypeptide that is separated from the environment in which it naturally occurs and/or that is free of the majority of the polypeptides that are present in the environment in which it was synthesized. For example, a substantially purified polypeptide is free from cellular polypeptides. Those skilled in the art would readily understand that the polypeptides of the invention may be purified from a natural source, i.e., a bacterial cell of the order Actinomycetales, or produced by recombinant means.

[0049] The nucleic acids of ORF 1 to 49 can be isolated, optionally modified and inserted into a host cell to create and/or modify a metabolic (biosynthetic) and thereby enable that host cell to synthesize and/or modify various metabolites.

[0050] Alternatively, the everninomicin gene cluster can be expressed in the host cell and the encoded everninomicin polypeptides recovered for use as chemical reagents, e.g. in the ex vivo synthesis and/or chemical modification of various metabolites. Either application typically entails insertion of one or more nucleic acids encoding one or more isolated and/or modified everninomicin open reading frames in a metabolic/biosynthetic pathway (in which case the synthetic product of the pathway is typically recovered) or the everninomicin polypeptides themselves are recovered. The nucleic acid(s) are typically in an expression vector, a construct containing control elements suitable to direct expression of the everninomicin polypeptides. The expressed everninomicin polypeptides in the host cell then act as components of a metabolic/biosynthetic pathway (in which case the synthetic product of the pathway is typically recovered) or the everninomicin polypeptides themselves are recovered. Using the sequence information provided herein, cloning and expression of everninomicin nucleic acids can be accomplished using routine and well-known methods.

[0051] The ORFs (SEQ ID NOS: 2, 5 to 7, 9 to 21, 23 to 35, 37 to 46, 48, and 50 to 58) can be used to synthesize everninomicin antibiotics and/or analogues thereof. Alternatively, various components of the everninomicin gene cluster can be used to synthesize and/or chemically modify a wide variety of biomolecules/metabolites.

[0052] Polynucleotides encoding homologous polypeptides or allelic variants are retrieved by polymerase chain reaction (PCR) amplification of genomic bacterial DNA extracted by conventional methods. This involves the use of synthetic oligonucleotide primers matching upstream and downstream of the 5′ and 3′ ends of the encoding domain. Suitable primers are designed according to the nucleotide sequence information provided in FIG. 1. The procedure is as follows: a primer is selected which consists of 10 to 40, preferably 15 to 25 nucleotides. It is advantageous to select primers containing C and G nucleotides in a proportion sufficient to ensure efficient hybridization; i.e., an amount of C and G nucleotides of at least 40%, preferably 50% of the total nucleotide content. A standard PCR reaction contains typically 0.5 to 5 Units of Taq DNA polymerase per 100 μL, 20 to 200 μM deoxynucleotide each, preferably at equivalent concentrations, 0.5 to 2.5 mM magnesium over the total deoxynucleotide concentration, 10⁵ to 10⁶ target molecules, and about 20 pmol of each primer. About 25 to 50 PCR cycles are performed, with an annealing temperature 15° C. to 5° C. below the true Tm of the primers. A more stringent annealing temperature improves discrimination against incorrectly annealed primers and reduces incorportion of incorrect nucleotides at the 3′ end of primers. A denaturation temperature of 95° C. to 97° C. is typical, although higher temperatures may be appropriate for denaturation of G+C-rich targets. The number of cycles performed depends on the starting concentration of target molecules, though typically more than 40 cycles is not recommended as non-specific background products tend to accumulate.

[0053] An alternative method for retrieving polynucleotides encoding homologous polypeptides or allelic variants is by hybridization screening of a DNA or RNA library. Hybridization procedures are well-known in the art and are described in Ausubel et al., (Ausubel et al., Current Protocols in Molecular Biology, John Wiley & Sons Inc., 1994), Silhavy et al. (Silhavy et al. Experiments with Gene Fusions, Cold Spring Harbor Laboratory Press, 1984), and Davis et al. (Davis et al. A Manual for Genetic Engineering: Advanced Bacterial Genetics, Cold Spring Harbor Laboratory Press, 1980). Important parameters for optimizing hybridization conditions are reflected in a formula used to obtain the critical melting temperature above which two complementary DNA strands separate from each other (Casey & Davidson, Nucl. Acid Res. (1977) 4:1539). For polynucleotides of about 600 nucleotides or larger, this formula is as follows: Tm=81.5+0.5×(% G+C)+1.6 log (positive ion concentration)−0.6×(% formamide). Under appropriate stringency conditions, hybridization temperature (Th) is approximately 20 to 40° C., 20 to 25° C., or, preferably 30 to 40° C. below the calculated Tm. Those skilled in the art will understand that optimal temperature and salt conditions can be readily determined.

[0054] For the polynucleotides of the invention, stringent conditions are achieved for both pre-hybridizing and hybridizing incubations (i) within 4-16 hours at 42° C., in 6×SSC containing 50% formamide, or (ii) within 4-16 hours at 65° C. in an aqueous 6×SSC solution (1 M NaCl, 0.1M sodium citrate (pH 7.0)).

[0055] The native everninomicin gene cluster ORFs can be re-ordered, modified and combined with other biosynthetic units to produce a wide variety of molecules. Large chemical libraries can be produced and screened for a desired activity.

[0056] Useful homologs and fragments thereof that do not occur naturally are designed using known methods for identifying regions of a polypeptide that are likely to tolerate amino acid sequence changes and/or deletions. As an example, homologous polypeptides from different species are compared; conserved sequences are identified. The more divergent sequences are the most likely to tolerate sequence changes. Homology among sequences may be analyzed using the BLAST homology searching algorithm of Altschul et al., Nucleic Acids Res. 25:3389-3402 (1997).

[0057] Alternatively, identification of homologous polypeptides or polypeptide derivatives encoded by polynucleotides of the invention which have activity in the everninomicin biosynthetic pathway may be achieved by screening for cross-reactivity with an antibody raised against the polypeptide of reference having an amino acid sequence of FIG. 1. The procedure is as follows: an antibody is raised against a purified reference polypeptide, a fusion polypeptide (for example, an expression product of MBP, GST, or His-tag systems), or a synthetic peptide derived from the reference polypeptide. Where an antibody is raised against a fusion polypeptide, two different fusion systems are employed. Specific antigenicity can be determined according to a number of methods, including Western blot (Towbin et al., Proc. Natl. Acad. Sci. USA (1979) 76:4350), dot blot, and ELISA, as described below.

[0058] In a Western blot assay, the product to be screened, either as a purified preparation or a total E. coli extract, is submitted to SDS-Page electrophoresis as described by Laemmli (Nature (1970) 227:680). After transfer to a nitrocellulose membrane, the material is further incubated with the antibody diluted in the range of dilutions from about 1:5 to about 1:5000, preferably from about 1:100 to about 1:500. Specific antigenicity is shown once a band corresponding to the product exhibits reactivity at any of the dilutions in the above range.

[0059] In an ELISA assay, the product to be screened is preferably used as the coating antigen. A purified preparation is preferred, although a whole cell extract can also be used. Briefly, about 100 μl of a preparation at about 10 μg protein/ml are distributed into wells of a 96-well polycarbonate ELISA plate. The plate is incubated for 2 hours at 37° C. then overnight at 4° C. The plate is washed with phosphate buffer saline (PBS) containing 0.05% Tween 20 (PBS/Tween buffer). The wells are saturated with 250 μl PBS containing 1% bovine serum albumin (BSA) to prevent non-specific antibody binding. After 1 hour incubation at 37° C., the plate is washed with PBS/Tween buffer. The antibody is serially diluted in PBS/Tween buffer containing 0.5% BSA. 100 μl of dilutions are added per well. The plate is incubated for 90 minutes at 37° C., washed and evaluated according to standard procedures. For example, a goat anti-rabbit peroxidase conjugate is added to the wells when specific antibodies were raised in rabbits. Incubation is carried out for 90 minutes at 37° C. and the plate is washed. The reaction is developed with the appropriate substrate and the reaction is measured by colorimetry (absorbance measured spectrophotometrically). Under the above experimental conditions, a positive reaction is shown by O.D. values greater than a non immune control serum.

[0060] In a dot blot assay, a purified product is preferred, although a whole cell extract can also be used. Briefly, a solution of the product at about 100 μg/ml is serially two-fold diluted in 50 mM Tris-HCl (pH 7.5). 100 μl of each dilution are applied to a nitrocellulose membrane 0.45 μm set in a 96-well dot blot apparatus (Biorad). The buffer is removed by applying vacuum to the system. Wells are washed by addition of 50 mM Tris-HCl (pH 7.5) and the membrane is air-dried. The membrane is saturated in blocking buffer (50 mM Tris-HCl (pH 7.5) 0.15 M NaCl, 10 g/L skim milk) and incubated with an antibody dilution from about 1:50 to about 1:5000, preferably about 1:500. The reaction is revealed according to standard procedures. For example, a goat anti-rabbit peroxidase conjugate is added to the wells when rabbit antibodies are used. Incubation is carried out 90 minutes at 37° C. and the blot is washed. The reaction is developed with the appropriate substrate and stopped. The reaction is measured visually by the appearance of a colored spot, e.g., by colorimetry. Under the above experimental conditions, a positive reaction is shown once a colored spot is associated with a dilution of at least about 1:5, preferably of at least about 1:500.

[0061] Another aspect of the invention provides a process for purifying a polypeptide or polypeptide derivative of the invention by affinity chromatography using as a ligand either an antibody or an orthosomycin-related compound which binds to the polypeptide. The antibody is either polyclonal or monoclonal. Purified IgGs are prepared from an antiserum using standard methods (see, e.g., Coligan et al., Current Protocols in Immunology (1994) John Wiley & Sons, Inc., New York, N.Y.). Conventional chromatography supports are described in, e.g., Antibodies: A Laboratory Manual, D. Lane, E. Harlow, Eds. (1988).

[0062] Consistent with this aspect of the invention, polypeptide derivatives are provided that are partial sequences of the amino acid sequences of FIG. 1, partial sequences of polypeptide sequences homologous to the amino acid sequences of FIG. 1, polypeptides derived from full-length polypeptides by internal deletion, and fusion proteins.

[0063] Polynucleotides of 30 to 600 nucleotides encoding partial sequences of sequences homologous to nucleotide sequences of FIG. 1 are retrieved by PCR amplification using the parameters outlined above and using primers matching the sequences upstream and downstream of the 5′ and 3′ ends of the fragment to be amplified. The template polynucleotide for such amplification is either the full length polynucleotide homologous to a polynucleotide sequence of FIG. 1, or a polynucleotide contained in a mixture of polynucleotides such as a DNA or RNA library. As an alternative method for retrieving the partial sequences, screening hybridization is carried out under conditions described above and using the formula for calculating Tm. Where fragments of 30 to 600 nucleotides are to be retrieved, the calculated Tm is corrected by subtracting (600/polynucleotide size in base pairs) and the stringency conditions are defined by a hybridization temperature that is 5 to 10° C. below Tm. Where oligonucleotides shorter than 20-30 bases are to be obtained, the formula for calculating the Tm is as follows: Tm=4×(G+C)+2×(A+T). For example, an 18 nucleotide fragment of 50% G+C would have an approximate Tm of 54° C. Short peptides that are fragments of the polypeptide sequences of FIG. 1 or their homologous sequences, are obtained directly by chemical synthesis (E. Gross and H. J. Meinhofer, 4 The Peptides: Analysis, Synthesis, Biology; Modern Techniques of Peptide Synthesis, John Wiley & Sons (1981), and M. Bodanzki, Principles of Peptide Synthesis, Springer-Verlag (1984)).

[0064] Polynucleotides encoding polypeptide fragments and polypeptides having large internal deletions are constructed using standard methods (Ausubel et al., Current Protocols in Molecular Biology, John Wiley & Sons Inc., 1994). Such methods include standard PCR, inverse PCR, restriction enzyme treatment of cloned DNA molecules, or the method of Kunkel et al. (Kunkel et al Proc. Natl. Acad. Sci. USA (1985) 82:448). Components for these methods and instructions for their use are readily available from various commercial sources such as Stratagene. Once the deletion mutants have been constructed, they are tested for their ability to improve production of everninomicin or generate novel analogues of the antibiotic or natural products of the orthosomycin class as described above.

[0065] As used herein, a fusion polypeptide is one that contains a polypeptide or a polypeptide derivative of the invention fused at the N- or C-terminal end to any other polypeptide (hereinafter referred to as a peptide tail). A simple way to obtain such a fusion polypeptide is by translation of an in-frame fusion of the polynucleotide sequences, i.e., a hybrid gene. The hybrid gene encoding the fusion polypeptide is inserted into an expression vector which is used to transform or transfect a host cell. Alternatively, the polynucleotide sequence encoding the polypeptide or polypeptide derivative is inserted into an expression vector in which the polynucleotide encoding the peptide tail is already present. Such vectors and instructions for their use are commercially available, e.g. the pMal-c2 or pMal-p2 system from New England Biolabs, in which the peptide tail is a maltose binding protein, the glutathione-S-transferase system of Pharmacia, or the His-Tag system available from Novagen. These and other expression systems provide convenient means for further purification of polypeptides and derivatives of the invention.

[0066] Vectors, Transformed Cells, Primers and Probes:

[0067] A polynucleotide molecule according to the invention, including RNA, DNA, or modifications or combinations thereof, have various applications. A DNA molecule is used, for example, for producing a polypeptide of the invention in a recombinant host system. Another aspect of the invention encompasses (a) an expression cassette containing a DNA molecule of the invention placed under the control of the elements required for expression, in particular under the control of an appropriate promoter; (b) an expression vector containing an expression cassette of the invention; (c) a prokaryotic cell transformed with an expression cassette and/or vector of the invention, as well as (d) a process for producing a polypeptide or polypeptide derivative encoded by a polynucleotide of the invention, which involves culturing a prokaryotic cell transformed with an expression cassette and/or vector of the invention under conditions that allow expression of the DNA molecule of the invention, and recovering the encoded polypeptide or polypeptide derivative from the culture.

[0068] A recombinant expression system is selected from prokaryotic hosts. Bacterial cells are available from a number of different sources including commercial sources to those skilled in the art, e.g., the American Type Culture Collection (ATCC; Rockville, Md.). Commercial sources of cells used for recombinant protein expression also provide instructions for usage of the cells.

[0069] The choice of the expression system depends on the features desired for the expressed polypeptide. For example, it may be useful to produce a polypeptide of the invention in a particular lipidated form or any other form.

[0070] One skilled in the art would readily understand that not all vectors and expression control sequences and hosts would be expected to express equally well the polynucleotides of this invention. With the guidelines described below, however, a selection of vectors, expression control sequences and hosts may be made without undue experimentation and without departing from the scope of this invention.

[0071] In selecting a vector, the host must be chosen that is compatible with the vector which is to exist and possibly replicate in it. Considerations are made with respect to the vector copy number, the ability to control the copy number and expression of other proteins such as antibiotic resistance. In selecting an expression control sequence, a number of variables are considered. Among the important variables are the relative strength of the sequence (e.g. the ability to drive expression under various conditions), the ability to control the sequence's function and compatibility between the polynucleotide to be expressed and the control sequence (e.g. secondary structures are considered to avoid hairpin structures which prevent efficient transcription). In selecting the host, unicellular hosts are selected which are compatible with the selected vector, tolerant of any possible toxic effects of the expressed product, able to secrete the expressed product efficiently if such is desired, able to express the product in the desired conformation, easily scaled up, and having regard to ease of purification of the final product, which may be the expressed polypeptide or the natural product, e.g. an antibiotic, which is a product of the biosynthetic pathway of which the expressed polypeptide is a part.

[0072] The choice of the expression cassette depends on the host system selected as well as the features desired for the expressed polypeptide or natural product. Typically, an expression cassette includes a promoter that is functional in the selected host system and can be constitutive or inducible; a ribosome binding site; a start codon (ATG) if necessary; optionally a region encoding a leader peptide; a DNA molecule of the invention; a stop codon; and optionally a 3′ terminal region (translation and/or transcription terminator). The leader peptide encoding region is adjacent to the polynucleotide of the invention and placed in proper reading frame. The leader peptide-encoding region, if present, is homologous or heterologous to the DNA molecule encoding the mature polypeptide and is compatible with the secretion apparatus of the host used for expression. The open reading frame constituted by the DNA molecule of the invention, solely or together with the leader peptide, is placed under the control of the promoter so that transcription and translation occur in the host system. Promoters and leader peptide encoding regions are widely known and available to those skilled in the art.

[0073] The expression cassette is typically part of an expression vector, which is selected for its ability to replicate in the chosen expression system. Expression vectors (e.g., plasmids and cosmids) are widely known and are readily available to those skilled in the art. For bacterial vectors, the polynucleotide of the invention is inserted into the bacterial genome or remains in a free state as part of a plasmid. Methods for transforming host cells with expression vectors are well-known in the art.

[0074] The sequence information provided in the present application enables the design of specific nucleotide probes and primers that are used for identifying and isolating putative orthosomycin-producing microorganisms. Accordingly, an aspect of the invention provides a nucleotide probe or primer having a sequence found in or derived by degeneracy of the genetic code from a sequence shown in FIG. 1.

[0075] The term “probe” as used in the present application refers to DNA (preferably single stranded) or RNA molecules (or modifications or combinations thereof) that hybridize under the stringent conditions, as defined above, to nucleic acid molecules of FIG. 1 or to sequences homologous to those of FIG. 1, or to their complementary or anti-sense sequences. Generally, probes are significantly shorter than full-length sequences. Such probes contain from about 5 to about 100, preferably from about 10 to about 80, nucleotides. In particular, probes have sequences that are at least 75%, preferably at least 85%, more preferably 95% homologous to a portion of a sequence disclosed in FIG. 1 or that are complementary to such sequences. Probes may contain modified bases such as inosine, methyl-5-deoxycytidine, deoxyuridine, dimethylamino-5-deoxyuridine, or diamino-2, 6-purine. Sugar or phosphate residues may also be modified or substituted. For example, a deoxyribose residue may be replaced by a polyamide (Nielsen et al., Science (1991) 254:1497) and phosphate residues may be replaced by ester groups such as diphosphate, alkyl, arylphosphonate and phosphorothioate esters. In addition, the 2′-hydroxyl group on ribonucleotides may be modified by including such groups as alkyl groups.

[0076] Probes of the invention are used for identifying and isolating putative orthosomycin-producing microorganisms, as capture or detection probes. Such capture probes are conventionally immobilized on a solid support, directly or indirectly, by covalent means or by passive adsorption. A detection probe is labeled by a detection marker selected from: radioactive isotopes, enzymes such as peroxidase, alkaline phosphatase, enzymes able to hydrolyze a chromogenic or fluorogenic or luminescent substrate, compounds that are chromogenic or fluorogenic or luminescent, nucleotide base analogs, and biotin.

[0077] Probes of the invention are used in any conventional hybridization technique, such as dot blot (Maniatis et al., Molecular Cloning: A Laboratory Manual (1982) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.), Southern blot (Southern, J. Mol. Biol. (1975) 98:503), northern blot (identical to Southern blot with the exception that RNA is used as a target), or the sandwich technique (Dunn et al., Cell (1977) 12:23). The latter technique involves the use of a specific capture probe and/or a specific detection probe with nucleotide sequences that at least partially differ from each other.

[0078] A primer is a probe of usually about 10 to about 40 nucleotides that is used to initiate enzymatic polymerization of DNA in an amplification process (e.g., PCR), in an elongation process, or in a reverse transcription method. Primers used in diagnostic methods involving PCR are labeled by methods known in the art.

[0079] As described herein, the invention also encompasses (i) a reagent comprising a probe of the invention for detecting and/or isolating putative orthosomycin-producing microorganisms; (ii) a method for detecting and/or isolating putative orthosomycin-producing microorganisms, in which DNA or RNA is extracted from the microorganism and denatured, and exposed to a probe of the invention, for example, a capture probe or detection probe or both, under stringent hybridization conditions, such that hybridization is detected; and (iii) a method for detecting and/or isolating putative orthosomycin-producing microorganisms, in which (a) a sample is recovered or derived from the microorganism, (b) DNA is extracted therefrom, (c) the extracted DNA is primed with at least one, and preferably two, primers of the invention and amplified by polymerase chain reaction, and (d) the amplified DNA fragment is produced.

[0080] It is understood that the embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. All publications, patents and patent applications cited herein are hereby incorporated by reference in their entirety for all purposes.

0 SEQUENCE LISTING <160> NUMBER OF SEQ ID NOS: 58 <210> SEQ ID NO 1 <211> LENGTH: 1987 <212> TYPE: DNA <213> ORGANISM: M. carbonacea <220> FEATURE: <221> NAME/KEY: misc_feature <222> LOCATION: (926)..(1675) <223> OTHER INFORMATION: ORF 1, negative strandedness <400> SEQUENCE: 1 gagatccata tccgcagcgt cggggacggg cactccatta ccgggggcct ccccggcacc 60 gcgaggtgtg gcgccagggg ccgcgcggtg gacggcgacc gaggtcgcca gcgcgtctgg 120 gtggtcggcg tggcggttgt cccgcagggt gccttggcgg accaggttct gccggcgtcc 180 ggacgccgcc tcgacatcgt tggggaggtt cagccgcaca gctaccgaca gcgtgcagcg 240 gggagtaggt ctgccggctc gaagtgtccg tggacggcat cgcccagggc gggccggagc 300 tgctgatcgt ggccgtcagc ggggcggcgc cggacgtgcc gtgctgcgct ccgacgccgc 360 cgcgctcgac ctcgacctcg gggagaggca cccgtcggcg tgccctcgtc gacccggacg 420 gtgacccgcg ccaacgccgg tgactgtccg tggaccgtgg ccgccgccct caccgtcacc 480 cccggctgat cggcgcctcg acgtcggtgc tcagccgcgc cggcgccacc gacgccggtg 540 gggtcgccgc cccgtcgctg gcgctgagcg tctcggggcg gctggccacc acgaccggca 600 acggcgatct gcgggtatgc gccgagggtc acgtcggtga cgtcagcggc tggcccggct 660 tcgccggctg gaggccatcg aggacatcac catgctctgc gtgcccgatc tggtcaccgc 720 cggccagcag gggccatcga cgacgaggcg tcagggccgt gccgctggcg atgatcgtgg 780 actgcgagct ggtgggtgac cgggtggccg tcctcgatcc gccgtccggc ctgcacccgc 840 agcggatccg ggaacggcgg atgggcgtcg ccggctgcga ctccaggtgc cgccggtccg 900 gatcccgtcg gtggatcgag gctcaggccg cccgccgcca gtagacgctg tactcgtcga 960 tcacctgaag tggttcggtg actccctggg ccgcgcggaa ctcctggacc gccttgcggc 1020 aggccggaat gacgtagtcg tcgatcacga cgtatccgcc cgggctgact ttcgcataca 1080 ggttgaccag ggcgtccctg gtcgactcgt agaggtcgcc gtcgagtcgc agcacggcga 1140 gttggctgat gggcgcgtgt ggcagcgtgt ccgagaacca ccccggcagg aaccgcacct 1200 ggtcgtccag gagcccgtag cgggcgaagt tggcttgtac gacctccacc gggatgccca 1260 gcacgtcatt gcagtggtgc agccccaggg cctggtccat cgggtgaccg tcggccccgg 1320 tgtccgggat cccttcgaac gaatccgcca cccacaccgt ccggtcccgg atcccgtagg 1380 cctcgaacac cccacgggcc atgatgcaca cgccgccgcg ccacacgccc gtctcgatga 1440 agtcaccggg gacgccgtcc gcgatgacct gctccaacag ggcgcggatg ttcctgatgc 1500 gcttcaaccc gaccatggtg tgcgccatgc tcggccagtc cttgccgttc tcccggttgg 1560 tcgccttgaa ctcccgctcg tgcagccact ggttgggcac cggcgggtcc tcgtagatca 1620 ggttcgtgac gaccttttcg aggagatcaa gatagagact tcggggatgc tccatgacgg 1680 tccttcgcgc attgggatcg gctgcggcca cggcggaggg ctcagcgggg aggcgggcgg 1740 cctgcggggg ctttcggcat ttccccgcat tctcggtcca ccgaggagtt cacggaacca 1800 cccgcttgcg cggatccggt tccggacctt cgtcctcgct cggatccccg gaccggagtg 1860 acgcgggcgc atgactcggg gccggaatcg tgcaccgcca gacgaatcga tgtgcggggc 1920 ggtggtcccg gccgcagatc gagcgaacgt ctgtactcat ctggcatatg atcgcacgcc 1980 cttcgtc 1987 <210> SEQ ID NO 2 <211> LENGTH: 250 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 2 Met Glu His Pro Arg Ser Leu Tyr Leu Asp Leu Leu Glu Lys Val Val 1 5 10 15 Thr Asn Leu Ile Tyr Glu Asp Pro Pro Val Pro Asn Gln Trp Leu His 20 25 30 Glu Arg Glu Phe Lys Ala Thr Asn Arg Glu Asn Gly Lys Asp Trp Pro 35 40 45 Ser Met Ala His Thr Met Val Gly Leu Lys Arg Ile Arg Asn Ile Arg 50 55 60 Ala Leu Leu Glu Gln Val Ile Ala Asp Gly Val Pro Gly Asp Phe Ile 65 70 75 80 Glu Thr Gly Val Trp Arg Gly Gly Val Cys Ile Met Ala Arg Gly Val 85 90 95 Phe Glu Ala Tyr Gly Ile Arg Asp Arg Thr Val Trp Val Ala Asp Ser 100 105 110 Phe Glu Gly Ile Pro Asp Thr Gly Ala Asp Gly His Pro Met Asp Gln 115 120 125 Ala Leu Gly Leu His His Cys Asn Asp Val Leu Gly Ile Pro Val Glu 130 135 140 Val Val Gln Ala Asn Phe Ala Arg Tyr Gly Leu Leu Asp Asp Gln Val 145 150 155 160 Arg Phe Leu Pro Gly Trp Phe Ser Asp Thr Leu Pro His Ala Pro Ile 165 170 175 Ser Gln Leu Ala Val Leu Arg Leu Asp Gly Asp Leu Tyr Glu Ser Thr 180 185 190 Arg Asp Ala Leu Val Asn Leu Tyr Ala Lys Val Ser Pro Gly Gly Tyr 195 200 205 Val Val Ile Asp Asp Tyr Val Ile Pro Ala Cys Arg Lys Ala Val Gln 210 215 220 Glu Phe Arg Ala Ala Gln Gly Val Thr Glu Pro Leu Gln Val Ile Asp 225 230 235 240 Glu Tyr Ser Val Tyr Trp Arg Arg Ala Ala 245 250 <210> SEQ ID NO 3 <211> LENGTH: 536 <212> TYPE: DNA <213> ORGANISM: M. carbonacea <400> SEQUENCE: 3 gaattcctag tgttcggcgc ggttgcgggc tcgccgatgt catggaaaac actagacaag 60 tgattcccga cgccgggtgg gccggcgtgg cgccgagcgc ggtcgcggcg gccagggaca 120 ccggagcccc gccccgaatc cgccggccag ggccctcgcc gcgcggcagg acctcggtcg 180 atccgtcggt cggaccgccg cccgctgccc ctacccgcca ggaaggtgca ccctgttctg 240 ctgtgggcca aggtctcgac gcccgccgcc ttgcgaatcc gctgccccct tcttttcctg 300 ccctcgatca atcgaggttc atcgacatga aaggggctag gattccgcca gtgccgaccg 360 ggccccgtcg ccggatgccc gagccgcgcc cgaacgaact gaccggtctg gcggacgccc 420 gcacgacgat gggcccgttc accgatcgtg cgcgatggag gattgatgat cgcgagcgcc 480 gcacccgtgg ctcccctggc ttcacatcaa ttggtgttgg ttcttctcga ggtcgg 536 <210> SEQ ID NO 4 <211> LENGTH: 3446 <212> TYPE: DNA <213> ORGANISM: M. carbonacea <220> FEATURE: <221> NAME/KEY: misc_feature <222> LOCATION: (3)..(1037) <223> OTHER INFORMATION: ORF 2 (positive strandedness) incomplete: C-terminus only (N-terminus undetermined) <221> NAME/KEY: misc_feature <222> LOCATION: (1077)..(2231) <223> OTHER INFORMATION: ORF 3 (positive strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (2242)..(3444) <223> OTHER INFORMATION: ORF 4 (negative strandedness) Incomplete: C-terminus only (N-terminus undetermined) <400> SEQUENCE: 4 ccgggctgca cgtcgacctg cgactgatcc gacgccgggc cggcacggtc gccacggtga 60 ccatgggtgg cctcctgctg cccctcgggt tgggcgtggc caccggcctg ctggtgccgg 120 cggcgctgtt ggcggcgacg gaccagcgcg tgatgttcgc cttcttcctc ggggtggcga 180 tggccgtcag cgccgtgccg gtgatcgcca agacgctcac cgacatgcgg ctgatgcacc 240 gtgacgtcgg tcagctcatc ctcgccgcag cgtccctgga cgacgcgttc gcctggttca 300 tgctgtcgct gatctcgtcc atggcggtca gcgccctcac cgtggggaac gtgctggcct 360 cgctgctcaa cctcgtcctg ttcatcgtcg cggcggcgct gatcggccgc ccggtggtca 420 ggcgtgcgat gcggtgggcg aacgcccaga tcgacgtggg gccagccgtc gccatcgcgg 480 tcgtcaccgt cctgctgttc tcggcggccg gacacgcgct cggccttgag gcgatcttcg 540 gcgcattggt ggcgggagtc ctgctcgggc tgcccggagg cgtcgagccg gcccggctgg 600 cgccgttgcg taccgtggtg ctctccgtgc tggcgccgct cttcctggcc accgccgggc 660 tccgggtcga cctgcgcgcc ctcgccgacc cggtggtgct cgtggccggt ctggtgatcc 720 tggtgctcgc cgtcctgggc aagttctgcg gcgcgtacct ggcaggccgg ctgacgcgcc 780 agagccactg ggaggcggtc gccctcgggg cgggactcaa ctcacggggc gtcgtggaga 840 tcgtcatcgc gatggtcggg ctgcgcctgg gcatcctcaa caccgccacc tacacgatcg 900 tggtgctcgt cgccgtcctc acgtccgtca tggcgccgcc gatgctccag cgggcgatgc 960 gccggatcga gcacaatgcc gaggaggcgc tgcgggagga gaaccaggcg cagttgatca 1020 cccgcccggt ggtgcggtga ggccgctgcc cgggacgcca tgctgccccg tgcagcgtgc 1080 atcgcctgga gggaccgcgc tggtacgttc gggcacgcga cgacgcgggc ccgagggaga 1140 gaatggtgac ggtgcggttc ttggcgcgga ccctgcgcgg cctggaggag gtcgcggcca 1200 gggaggtggc cgggcgcggc tgcggggtcg agcaccagcg gcaccgtgag gtgtggttcc 1260 gcgcgagccg tccggagccg agcctgctcg acctgcgtac cgtggacgac ctgttcctcc 1320 tcgccggggt gaccgaggac gcggaccaca cgaaggcggc cctggctgcc ttcacccgcc 1380 tggcgcgcga cgctccgctg cggcaactgc tcgaggtgcg gaagacctac ggctactccg 1440 cccgggccgg gacactcgat gtggcggcgt ccttcctcgg ccgccgcaac tacaaccggt 1500 acgacgtcga ggaggccgtc ggccgcaccg cggcggcccg gttgggcctg cgcttccact 1560 cccgccgcaa cggcgaggcg ccgcctgagg gcagcctctc gctgcgggtc accgtcgagg 1620 gcacccaggc ggccctggcg gtgcggatag ccgaccggcc gctgcaccgg cgctcctaca 1680 agacatcctc cacgccgggc acgctgcacc cgccgttggc cgccgcgctg gcgtggctgg 1740 ccgggatccg cgccgggatg cgggtggtcg acccgtgctg cggcacgggc acgatcctgc 1800 tcgagtccgg cgggctgagt ccgggagccg tcctgctcgg cctggatcac gatccggccg 1860 cggtccgcgc ggctgtggcc aacgcggggg cactcgacgg ggtccgccgt ggttcggcag 1920 gtgggacgcc cggcgtcacc tgggcggtag gtgacgccgg gcgcctgcca ctgggcgccg 1980 ggacggtgga ccgcgtggtc agcaatccac cgtgggaccg tcaggtgctg gcccgcggtg 2040 ccctcgcgga cgatccggcg cggctcttcc gggagatccg ccgggtgctt gcagccgacg 2100 gcctggccgt gttgctgctg cacgagttcg aggaactgac cggggcggtc gccgccgccg 2160 ggctgggcgt cgacgacgtg cgggtggtca gcctgttcgg cacccatccg gccatggtga 2220 ccctgtccgg ctgagccgtc agggcacgac ctccagctgg gccatcatcc cgaggtacga 2280 gtgctcgggg tagtggcagt ggtacatgta ccggccgacg aagggcgcgt cgaaggtgac 2340 ctggaagcgg acggagccct tgggcgacac gtacaccgtg tccttgagac cggtgtcctc 2400 cggagccggc ggcccgccgt tgcggccgag cacctggaag tgcaccaggt gcaggtggaa 2460 gggatggtcg aaggggtacg gatcggtgtc gccgttgacg atgttccaga tctccgtggt 2520 gccccgcttg acctggatgt cgacccggtt ggggtcgaac accttgccgt cgatgaaggc 2580 cgtcggcggc cggccggaca tgtcgaactt cagttccacg gtccgctcca ccgtcggcgt 2640 gcccagcggc ggcagctcgc gcaggcggtc cggcacgcga ctggtgtcga tgaccctggt 2700 ggaccccacg tcgaagcgca ggatcgggtt gtcgccgtcg aacaggtaga cggggccgcg 2760 tccgcggtgt tcggcgaagt cgatcacgat ctcgacccgt tcaccggagg agaccgccag 2820 ctcggtgtgg gtggtgggag cgggaagcag gccgctgtcc gaggcgatcc ggaccatcgt 2880 ctggccgccg aggttgagcc ggaagacgtg cttgagggcc gcattgagca gccggaaccg 2940 gtagcggcgg ggagccacct ggaagtacgg ctgaaccttg ccgttggcca ggatcgtcgt 3000 gcggtcgtcg gggttgccga agacgaacgc accggattcg tcgaactgcg cgttgcgcag 3060 caggatcggg acgtcgtagc gccccttggg caggtgcagg tgccgctcgg cggggtcctc 3120 gatgaggtag aagccgtgca ggccgcggta gacgtggtcg gcctcgtagt cgtgggtgtg 3180 gtcgtggtac cacagcgtgg ccccgcgttg gacgttcggg tagtcgtaga cccgcgagcc 3240 gcccggctcg atgatgtcca tcgggtgccc gtcactgctg gccggcacgc ggccaccgtg 3300 caggtgcacg ttcgtgtggc tgtccagccc gttggtgtag gtgatccgga cggggcggtt 3360 ggtccgcgcc cggatcgtcg ggccgacgaa cgagccgccg taggtgtagg ccggggtgga 3420 cagtcccggc aggatctgga cctggg 3446 <210> SEQ ID NO 5 <211> LENGTH: 345 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 5 Gly Leu His Val Asp Leu Arg Leu Ile Arg Arg Arg Ala Gly Thr Val 1 5 10 15 Ala Thr Val Thr Met Gly Gly Leu Leu Leu Pro Leu Gly Leu Gly Val 20 25 30 Ala Thr Gly Leu Leu Val Pro Ala Ala Leu Leu Ala Ala Thr Asp Gln 35 40 45 Arg Val Met Phe Ala Phe Phe Leu Gly Val Ala Met Ala Val Ser Ala 50 55 60 Val Pro Val Ile Ala Lys Thr Leu Thr Asp Met Arg Leu Met His Arg 65 70 75 80 Asp Val Gly Gln Leu Ile Leu Ala Ala Ala Ser Leu Asp Asp Ala Phe 85 90 95 Ala Trp Phe Met Leu Ser Leu Ile Ser Ser Met Ala Val Ser Ala Leu 100 105 110 Thr Val Gly Asn Val Leu Ala Ser Leu Leu Asn Leu Val Leu Phe Ile 115 120 125 Val Ala Ala Ala Leu Ile Gly Arg Pro Val Val Arg Arg Ala Met Arg 130 135 140 Trp Ala Asn Ala Gln Ile Asp Val Gly Pro Ala Val Ala Ile Ala Val 145 150 155 160 Val Thr Val Leu Leu Phe Ser Ala Ala Gly His Ala Leu Gly Leu Glu 165 170 175 Ala Ile Phe Gly Ala Leu Val Ala Gly Val Leu Leu Gly Leu Pro Gly 180 185 190 Gly Val Glu Pro Ala Arg Leu Ala Pro Leu Arg Thr Val Val Leu Ser 195 200 205 Val Leu Ala Pro Leu Phe Leu Ala Thr Ala Gly Leu Arg Val Asp Leu 210 215 220 Arg Ala Leu Ala Asp Pro Val Val Leu Val Ala Gly Leu Val Ile Leu 225 230 235 240 Val Leu Ala Val Leu Gly Lys Phe Cys Gly Ala Tyr Leu Ala Gly Arg 245 250 255 Leu Thr Arg Gln Ser His Trp Glu Ala Val Ala Leu Gly Ala Gly Leu 260 265 270 Asn Ser Arg Gly Val Val Glu Ile Val Ile Ala Met Val Gly Leu Arg 275 280 285 Leu Gly Ile Leu Asn Thr Ala Thr Tyr Thr Ile Val Val Leu Val Ala 290 295 300 Val Leu Thr Ser Val Met Ala Pro Pro Met Leu Gln Arg Ala Met Arg 305 310 315 320 Arg Ile Glu His Asn Ala Glu Glu Ala Leu Arg Glu Glu Asn Gln Ala 325 330 335 Gln Leu Ile Thr Arg Pro Val Val Arg 340 345 <210> SEQ ID NO 6 <211> LENGTH: 385 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 6 Val His Arg Leu Glu Gly Pro Arg Trp Tyr Val Arg Ala Arg Asp Asp 1 5 10 15 Ala Gly Pro Arg Glu Arg Met Val Thr Val Arg Phe Leu Ala Arg Thr 20 25 30 Leu Arg Gly Leu Glu Glu Val Ala Ala Arg Glu Val Ala Gly Arg Gly 35 40 45 Cys Gly Val Glu His Gln Arg His Arg Glu Val Trp Phe Arg Ala Ser 50 55 60 Arg Pro Glu Pro Ser Leu Leu Asp Leu Arg Thr Val Asp Asp Leu Phe 65 70 75 80 Leu Leu Ala Gly Val Thr Glu Asp Ala Asp His Thr Lys Ala Ala Leu 85 90 95 Ala Ala Phe Thr Arg Leu Ala Arg Asp Ala Pro Leu Arg Gln Leu Leu 100 105 110 Glu Val Arg Lys Thr Tyr Gly Tyr Ser Ala Arg Ala Gly Thr Leu Asp 115 120 125 Val Ala Ala Ser Phe Leu Gly Arg Arg Asn Tyr Asn Arg Tyr Asp Val 130 135 140 Glu Glu Ala Val Gly Arg Thr Ala Ala Ala Arg Leu Gly Leu Arg Phe 145 150 155 160 His Ser Arg Arg Asn Gly Glu Ala Pro Pro Glu Gly Ser Leu Ser Leu 165 170 175 Arg Val Thr Val Glu Gly Thr Gln Ala Ala Leu Ala Val Arg Ile Ala 180 185 190 Asp Arg Pro Leu His Arg Arg Ser Tyr Lys Thr Ser Ser Thr Pro Gly 195 200 205 Thr Leu His Pro Pro Leu Ala Ala Ala Leu Ala Trp Leu Ala Gly Ile 210 215 220 Arg Ala Gly Met Arg Val Val Asp Pro Cys Cys Gly Thr Gly Thr Ile 225 230 235 240 Leu Leu Glu Ser Gly Gly Leu Ser Pro Gly Ala Val Leu Leu Gly Leu 245 250 255 Asp His Asp Pro Ala Ala Val Arg Ala Ala Val Ala Asn Ala Gly Ala 260 265 270 Leu Asp Gly Val Arg Arg Gly Ser Ala Gly Gly Thr Pro Gly Val Thr 275 280 285 Trp Ala Val Gly Asp Ala Gly Arg Leu Pro Leu Gly Ala Gly Thr Val 290 295 300 Asp Arg Val Val Ser Asn Pro Pro Trp Asp Arg Gln Val Leu Ala Arg 305 310 315 320 Gly Ala Leu Ala Asp Asp Pro Ala Arg Leu Phe Arg Glu Ile Arg Arg 325 330 335 Val Leu Ala Ala Asp Gly Leu Ala Val Leu Leu Leu His Glu Phe Glu 340 345 350 Glu Leu Thr Gly Ala Val Ala Ala Ala Gly Leu Gly Val Asp Asp Val 355 360 365 Arg Val Val Ser Leu Phe Gly Thr His Pro Ala Met Val Thr Leu Ser 370 375 380 Gly 385 <210> SEQ ID NO 7 <211> LENGTH: 401 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 7 Gln Val Gln Ile Leu Pro Gly Leu Ser Thr Pro Ala Tyr Thr Tyr Gly 1 5 10 15 Gly Ser Phe Val Gly Pro Thr Ile Arg Ala Arg Thr Asn Arg Pro Val 20 25 30 Arg Ile Thr Tyr Thr Asn Gly Leu Asp Ser His Thr Asn Val His Leu 35 40 45 His Gly Gly Arg Val Pro Ala Ser Ser Asp Gly His Pro Met Asp Ile 50 55 60 Ile Glu Pro Gly Gly Ser Arg Val Tyr Asp Tyr Pro Asn Val Gln Arg 65 70 75 80 Gly Ala Thr Leu Trp Tyr His Asp His Thr His Asp Tyr Glu Ala Asp 85 90 95 His Val Tyr Arg Gly Leu His Gly Phe Tyr Leu Ile Glu Asp Pro Ala 100 105 110 Glu Arg His Leu His Leu Pro Lys Gly Arg Tyr Asp Val Pro Ile Leu 115 120 125 Leu Arg Asn Ala Gln Phe Asp Glu Ser Gly Ala Phe Val Phe Gly Asn 130 135 140 Pro Asp Asp Arg Thr Thr Ile Leu Ala Asn Gly Lys Val Gln Pro Tyr 145 150 155 160 Phe Gln Val Ala Pro Arg Arg Tyr Arg Phe Arg Leu Leu Asn Ala Ala 165 170 175 Leu Lys His Val Phe Arg Leu Asn Leu Gly Gly Gln Thr Met Val Arg 180 185 190 Ile Ala Ser Asp Ser Gly Leu Leu Pro Ala Pro Thr Thr His Thr Glu 195 200 205 Leu Ala Val Ser Ser Gly Glu Arg Val Glu Ile Val Ile Asp Phe Ala 210 215 220 Glu His Arg Gly Arg Gly Pro Val Tyr Leu Phe Asp Gly Asp Asn Pro 225 230 235 240 Ile Leu Arg Phe Asp Val Gly Ser Thr Arg Val Ile Asp Thr Ser Arg 245 250 255 Val Pro Asp Arg Leu Arg Glu Leu Pro Pro Leu Gly Thr Pro Thr Val 260 265 270 Glu Arg Thr Val Glu Leu Lys Phe Asp Met Ser Gly Arg Pro Pro Thr 275 280 285 Ala Phe Ile Asp Gly Lys Val Phe Asp Pro Asn Arg Val Asp Ile Gln 290 295 300 Val Lys Arg Gly Thr Thr Glu Ile Trp Asn Ile Val Asn Gly Asp Thr 305 310 315 320 Asp Pro Tyr Pro Phe Asp His Pro Phe His Leu His Leu Val His Phe 325 330 335 Gln Val Leu Gly Arg Asn Gly Gly Pro Pro Ala Pro Glu Asp Thr Gly 340 345 350 Leu Lys Asp Thr Val Tyr Val Ser Pro Lys Gly Ser Val Arg Phe Gln 355 360 365 Val Thr Phe Asp Ala Pro Phe Val Gly Arg Tyr Met Tyr His Cys His 370 375 380 Tyr Pro Glu His Ser Tyr Leu Gly Met Met Ala Gln Leu Glu Val Val 385 390 395 400 Pro <210> SEQ ID NO 8 <211> LENGTH: 14252 <212> TYPE: DNA <213> ORGANISM: M. carbonacea <220> FEATURE: <221> NAME/KEY: misc_feature <222> LOCATION: (459)..(1280) <223> OTHER INFORMATION: ORF 5 (positive strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (2677)..(3747) <223> OTHER INFORMATION: ORF 7 (positive strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (1280)..(2566) <223> OTHER INFORMATION: ORF 6 (positive strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (3899)..(4774) <223> OTHER INFORMATION: ORF 8 (positive strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (4893)..(5303) <223> OTHER INFORMATION: ORF 9 (positive strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (5365)..(6306) <223> OTHER INFORMATION: ORF 10 (negative strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (6350)..(7204) <223> OTHER INFORMATION: ORF 11 (negative strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (7371)..(8198) <223> OTHER INFORMATION: ORF 12 (negative strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (8304)..(9098) <223> OTHER INFORMATION: ORF 13 (ngative strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (9462)..(10493) <223> OTHER INFORMATION: ORF 14 (positive strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (10665)..(11384) <223> OTHER INFORMATION: ORF 15 (negative strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (11387)..(12700) <223> OTHER INFORMATION: ORF 16 (negative strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (12971)..(14185) <223> OTHER INFORMATION: ORF 17 (negative strandedness) <400> SEQUENCE: 8 cctagtcagt ttccactctt cgcgctctgc cggcggcgcc ggcacccgcg atcctcggcc 60 cctgtcctgg cggatccgcg gttgtggggc aaaccctagt cagttgtcag gcacggctcg 120 atagggtcgg atcaggcgag cccaaggtca atgtccgcgc cttcggcggg cccggggtca 180 ggtcgtgcgc cgcggacgtg gcgaggcttg acattctcgg ccgaaaggcg aacctgccga 240 cgctgacagc gcggaagtcc gcgatttcgc cgcaacccga aggggcaggc tcagcccatg 300 accatggtgg tacggcaccc ggccgagcgg gtcgagtgca gcccgatcgc ccctcggcgc 360 gacgccgctg gcgtgacgcc ggtcccgctc accccgagcg ctgcgcgtcc ccggtccgac 420 cggacaccgc cggtccaccg tgggcaggag ccccggcggt gatcggcttg ctgggccggc 480 tcccgggggt gaacgccgtg ctcggggccg tctcgaagca gcaggccgag ccgaccctcg 540 acgaggtgat ggccgaacgt ttccgcgaac ggacggatcc gcgccggggc gactgggcct 600 acgcgcactt catcgatctg cgcgacgcgc tcgccgaggt gctgggcgac gcttccggca 660 actggctcga ctacggcgcg ggcacgtcgc cgtaccggaa cctgttcacc gcggccgatc 720 tgaagacggc cgacattccc ggcggcgagt cctacccggc cgactacgcg ctcgaccacg 780 acggacgctg tccggcaccc gacgcgacgt tcgacggcgt gctgtccacc caggtcctcg 840 agcacgtgac cgacgcggac gcctacctgc gtgaggcgct gcggctgttg cggcccgggg 900 gccggctggt gctgtccacc cacggcgtgt gggaggagca cggcggtcag gacctctggc 960 ggtggacggc ggacggcctg gcccggcagg ccgaactggc cgggttcgcc gtcgaccggg 1020 tgctgaagct gacctgcggg ccgcgaggac tgctgctcct gctgcgctgg tacggacgcg 1080 agaacggctg gcccgcgatc ggcccggtcg ggttggtgct gcgctccctg tggttggtgg 1140 accacctgct acccagctcc ctggacacgt atctggatcg cgcattcggc gatctcggga 1200 gacgcgaggg cccggacgcg ccgttctatc tggaccttct gctcgtcgcc cggaaacccc 1260 acacgaagga gaccgctacg tgagtcggac cgcatcagcg tatgacgaga gcgtggtacg 1320 acaggtgaac gcgcggacgg actgccgggt ctgcggcggc acgctccgta cgatcctcga 1380 cctcggcgac cagtatctgc aggggtcctt cgtcaagccc gggacacccg agccgccggc 1440 ggtcaagttc ccgctcgaac tcacccgctg cgtcggcgac tgcggcctcg tgcagctgcg 1500 gcacaccctg ccccccggtc tgctgtacga cacctactgg taccgctcgc gcatcaacga 1560 caccatgcgg acgcacctca gggagatcgc cgaatccggg gtggcggcac tcggccggcc 1620 gctccggcgg gccctggaca tcggttgcaa cgacggcacc ctgctgcaga acctgcgcgg 1680 ggccgaactg tgggggatcg acccgtcgaa cgcgaccgac gacgcgcccg agggcatcac 1740 cctggtccgg gacttcttcc ccagcccggc gctggacgag cacgccggga cgttcgacgt 1800 cgtcacgtcg atcgcgatgt tctacgacgt cgaggacccg gtggcgttcg cccgcgcggt 1860 ggagcggatg ctcgctcccg gtggcgtctg ggtggtcgag gtcgcctacc tgcgcgagat 1920 gctggcgacc accgggtacg acagcatctg ccacgaacac ctgtcgtact actcgctctc 1980 caccctgacc ttcatcctgc gtcaggccgg gctcgagatc aggcgggcaa gcgtcaacgg 2040 gatgaacggc ggctcgatct gctgcgtcgt cacccgggcc accgagggcg ccgaccacgc 2100 cgacgggtcg gtggcggaac tcgccgcgca ggagcgcgag ctgggactgg accagagcga 2160 gccgtacgag cggttcgccg acaacgtgcg ggcgcaccgc gacgaactcg tcaagatgct 2220 gcatggtctg cgcgacagcg gaagcaccgt gcacgtctac ggcgcctcca ccaagggcaa 2280 caccctcctg cagtactgcg ggatcgaccg cacgctgatc ccgtacgccg ccgagcgcaa 2340 cccggacaag gtcggcgcgc ggaccctcgg tacggacatc gagatcatca gcgaggccga 2400 ctcgcgggcc cgccgcccgg accactacct ggtgctgccg tggcacttcc acgacgagat 2460 cgtggcgcgc gaggcggcca cggtggcggc cggaaccaag ctgatcttcc cgctgcccag 2520 cctgcgggtc gtgcaggcgt cgcggaccga ctcgcgggtg gggtcgtgac cggctcgctc 2580 gtccagcggc tgctcgccgc ggcggacgct cccgacccgg gcgtgcacct cgcggccgag 2640 gatccggaag cagtggtggc cgtggccatg gcggaggtgg cgggccggac cgtcctctac 2700 ccgggcccgg cgacgccgct gaccgtacag atcgacgtgg acgtcgctga cgcgcgacag 2760 atctcctacc tcctggcggc cggtccgcac ggcgcccagg cgcggccggg ccggaccgac 2820 gacccgtggg tgcgagtccg gtacgacctg gcggcgctgg tgcgggacgt gttcgggccg 2880 gccggcccgt ggaccggtac cggccgggac gtggtgatga aggacgagcc cggcccggtg 2940 gagtacaagc ccgacgaccc gtggctggta cggcgggaag aggcgacccg cgcggcctac 3000 caggctctcc gcgcgtgcga gccgtaccgt ggcgacctgg ccgcgctggc gctgcggttc 3060 ggctcggaca agtggggcgg gcactggtac acctcccact acgagcggca cctcggcggg 3120 ttccgggacc accggctgaa cctgctggag atcggcatcg gcggctacca cgagccggac 3180 gccggcgggg cctcgttgcg catgtggaag cactacttcc accgcggcag cgtgtacggg 3240 ctcgacgtgt acgacaagtc gctgctggac gagccacggc tcaccacgct ccgtggtgac 3300 caggccgacc cggcgatgct cgccgacctc gcgcggcggc acggcccctt cgacatcgtg 3360 atcgacgacg gcagccacgt cagcagccac gtcatcaccg cgttccaggc gctcttcccc 3420 cacgtgcgcc ccggcggcgt gtacgtgatc gaggacctgc acacctcgta ctggccggag 3480 tggggtggaa acggcaccga cctgtccgac cccgccacgt cggtcggctt cctcaagaca 3540 ctcgtcgacg gtctgcacca ccgcgatcgc ctccacgacg gtccgtacca gccgacgtac 3600 ccggacctga ccgtgacggg gctgcatctc taccacaatc tcgcgttcgt cgagaagggc 3660 cgtaacaccg aacaggccaa cgccacgtgg cggccgcgga acgacccgat gcgcgatctg 3720 ccgaaaccgc agcggtcagc gggggagtga ggactcatgc gtgtcgtgtt ggtgacgatg 3780 gcactgcggg tgccgacgga tccgagccac tggatcacgg tcccgccgca gggctatgcc 3840 ggcatccact ggatcgtggc gaaccacatg gacggcctgc tcgaactcgg cccacgaggt 3900 gttcctgctc ggcgcgccgg gcacgacgcc ggtcgcaccg gcggtcaccg tggtggacgc 3960 gggcgagatc gaggacatgc acgcctggct gaacggccct gaggcggcca cgatcgacgt 4020 cgtccacgac ttctcctgcg ggcagatcga tcccgaccgg cttccccggg gcatggcgta 4080 cctgtccacc caccacctga ccggcaagcc gaagtatccg cgcaactgtg tgtacgcctc 4140 gtatgcccaa cgggcccagg cggagaacga cgtcgcgccg gtggtccgca tctcggtgaa 4200 ccaggcgcgc tacccgttcc gggccgacaa ggacgactac ctgctctacc tcggtcggat 4260 ctcggaatgg aagggcacct acgaggcggc cgccttcgcc agcgccgccg ggcgtcgcct 4320 cgtcgtggcg ggcccgtcct gggaagagga ctacctggcc cggatcctgc gcgacttcgg 4380 ggacagcgtc gaccttgtcg gcgaggtggg gggcgaccgg cggctcgacc tgatctcccg 4440 cgcgaccgcg atgatggtcc tgtcgcagag caccatgggg ccgtggggcg tggtgtggtg 4500 tgagcccgga tcgaccgtgg tgtcggaggc cgcggcgtgc gggacgcccg tcatcggcac 4560 gccgaacgga tgcctggccg agatcgtgcc cgcggtcgga acggtcgtgc ccgagggcgc 4620 ggacttcacc gtcgaacagg cccggagcgt cgtggcggcg ctgcccgggc cggacgcggt 4680 ccgggcggcg gcgctggagc ggtgggacca cgtcgtggtg gccaaggagt tcgaggccat 4740 ctaccacgac gtgctcgccg gtcgtacctg gacgtgacat ccggctctcc cagtcggtgg 4800 gacgacgcca gccggcggcg acgcacctgc cagtcggccg gcaccgagta cccgtgatgt 4860 ccctccgggc ccactgacga atggagttca tcgtgaagat cgaggtcctg cagccgagct 4920 gcaacctgga caccgtccgg gacggccggg gcggcatctt cacctgggtg ccaccagagc 4980 cgatcctgga gttcaacctc atcaccatgc accccggcaa ggtccgtggg ctgcactacc 5040 acccgcactt cgtggaatac ctgctgttcg tcgacgggga gggggtgctg gtgaccaagg 5100 acgatccgga cgaccccgac tgcccggagg agttcatcca cgtcgcccgg gggacgtgta 5160 cgcgcacgcc ctccggagtg atgcacgcgg tctactcgat cacgtcgctg tccttcgtgg 5220 ccatgttgac ccgaccgtgg gacgagtgtg atccgcccat cgtccaggtg cagccgctgc 5280 cgcacaccct cgcggcgaac ggctgagcgc ccgagcgggg cgacccgctg gtgaaccgtt 5340 gacgatggcc ggaggcgcag gtcaccggct ttccaccggg tcgccttcca gcgcgtggcg 5400 ccagagcgcc ccgatcgcct cggacagcgt ccgccgcggc gtccagccga gcagctcacg 5460 cgccggccgc aggtcgaccc gggtccagtc ctcggcgcct gcggccggcg ccggcagttc 5520 gaccacggtg gccggcacct ggctgatgtc gacgagcatg gcgaccagcg tgcgcacgga 5580 caccgactcg ccccggccga tggcgatggg aacggtcgtg ccgggcaccc ggatcgcggc 5640 ccggatcgcc tcggcgacgt cgcgcacgtc cacgtagtcg cggcgggcgt ccagcgcggt 5700 cagctcgatg ttcgcgtgcc cgccacgacg tgccgcctcg accagactgc cggccaccag 5760 gccgagcagg ctggccggcg gcacaccggg gccggtgacg ttggccaggc gcaaaaccac 5820 cgggtccacg gtcccctgcg ccgcggcctc cagcacggcc tcggtcgcgg cgagcttgaa 5880 ccggtcgtac tcgctggcgg gtcgggacga ccgctgggcg gccccgggcg cgtccggtgc 5940 ggcgagccca cactcgagca ccgatccgag gtgcacgaac cgtggcacca acgaggtcat 6000 cgccagcgcc gtcaggatgg cctcggtcgc cccgacgcaa ctcgcctcaa ggccccgtcc 6060 ggtcagaccc cacttgccgc ccgtggcgtt gacgatcgcc gccgggcgct cggcggccag 6120 catcgcggcc agctccccgg gccgtacccc cgagacgtcg atcgcccgga accggtaccc 6180 ggtcgtggcc ctgggcgcgt ttctcgccac gacgagcacg tcgtgccccg cggccacgag 6240 gttcttcgcg acctggcgcc ccaggaagcc ggtgcctccg aacacgatga cgcggttgtc 6300 gctcactcgt acctcctgga cgacgactcg accggttggc ggacggtcaa tcgggacaga 6360 gctcgatcca gtggaagccc gtggacggca gcgacccgac ctcggcgatc cgcagacccg 6420 ccttggcgca gaggccggcg aagtcgtccc tcgtccgctc catcccctga ccgttgacga 6480 gcaggcccag gtcggtgagg taggtggtgg ggctctgccc gggcagcacg gtgtccggca 6540 tcaggtggtc gacgagcagg atgcgtccct gttccctggc ggcgcgggca cagttgcgca 6600 ggatcaccgc ggcatgctcg tcgtcccagc cgtggatcac gctcttgagc aggtacaggt 6660 cgccatcgcg cggcacctcc gagaagaagt ctcccgtttc gatccggcag cgggccgtca 6720 gacctgccgc ttccagggtc tgctcggccg cgtgcacacc ggacgggctg tcgaagagca 6780 ccccgcccag ccgggggtgc tcggccagga tctcgacgag cgacgtcccg tcgccaccgc 6840 cgacatcgac gaccgtccgg aaacggccga agtcgtacgc gccggccagc accctggcga 6900 ctccccgggt gccctgactc atcgcggcgt tgtacagctc ggacagctcc ggatgggacg 6960 acaggtagcc gaagaagtcg atcccgaacg cctcgtcgaa ggccgggccg ccggtgcgca 7020 ggctgaactc gaggttctgc caggcgctcg tcatcgtcgg atcggtcagc atccgggcca 7080 gcgggtacat cgatcccggc cggtcgctgc ggaacagcgc gcccacgggg gtgacggtga 7140 accggccggg gcggggttcg gcgagcaggt cgagcgcggc gagcgcacgc agcagccgca 7200 gcatcggacc ctcctggaag ccgtactcgg cggcgacacc tgccgcgtcc gttcctcgtc 7260 gccgatcgcg tcgggcagcc gcagccggac cgcgagcgcg accacgtgcg tcgcacatcc 7320 cgccgaacac cagccgcagc actgccggcc acggggagct cgcagggcta tccacgggcg 7380 agtcccgccc ggatcgcccg ctcgaccggg acgtactgcc cgtcggcgcg gtccaggtcg 7440 aactcgccgg tcggttgcag cgagggctgg gccatgaacc gggggctggt gccggtgttg 7500 gtgatcggcg tgtgcaccag gaacggatgg cagaggtagg cgtcgcccgc ccgcccggtg 7560 gccatcgcga gggggcggtc cgcgcccacg tcgcggcagg cgaggtaggt cccctcggcg 7620 ccgtagggcg ccagcagggg cggcacgtcc aggtgcgaac cgacccggat cagcgtgggc 7680 gcgtcacgct cgccggtgtc ggagtagagg agcagcacca gcagggcccg gccacgcgaa 7740 accaggttgc tgcggaagat ccggtcgtag tccggcggca cgagcgggag ctcgccctcc 7800 cagtcctggc cgctgctcat ggcggccacg ccctcggggc tgaggaagct ggcgtcgatg 7860 tgccagccgt agtcctcggc ctgttccgga tcccggtcca ccgggaaacg gatcgggaac 7920 gtcccgacca tgtccagcgg tcgccaccgg cccgcaccga cgagctggtc gtacgcctcg 7980 accaacgccg gggtgttggc gctctgcacg aacgcgtcgt cgccccgcag accgagccgg 8040 acgacctccc tggtccaggt cgagctgtcg tcgggatcca cgtcgagttg cttccagagc 8100 agattgcggc actcggcggc gagcgcggcg gggaaagcgt tcggcacccg gacgaagccg 8160 tcggcgacga agctctcgat ctgctcggct gtcagcatgc gcccctcctc atgaaactcc 8220 cctgccggac cggttatatc ctgacggcgc cgacggtagg cagttcctgc ggaagactag 8280 cgattccacc agaggtgcgg tcacgcccgt tgtcgggtga tctcgtacag cgtgatcgag 8340 gcggcgaccg tcgcgttcag cgaactcgcc gacccgacca tggggatccg gagcaccacg 8400 tcgcagttgt tggcccagaa actgctcatt ccgcttgtct cgttgccgac gacgacggcg 8460 gtcggcccgg tgaagtcgtg attccagatg tcggtgaccg cgtcctcact cgtgccgacg 8520 agcgtcatcg cgtcgatcgt ccgcagccat tccagcacgg cggtcggggt ctcggcccgc 8580 accgccggaa cggcgaacag cgagccgcgg ctgccccgaa ccgtcttcgg gtcgtagagg 8640 tcggccgccc ggcccgcgac gatcaccccg tcgatgccca gcgcgtcggc cgagcgcagg 8700 agcgagccca cgttgcccgg actgatcgga cggtcgagca ccaccagaac gccgttcggg 8760 cgtacgcgga tccgggtgag gtcgtccgga gggatcgcga cgacggcgat cagctcggtg 8820 gtgtcctcgt cctttcccgc gagctcgtgc agcagctccg gggacagccg gatcacctcg 8880 tcggcgacct gctccctgac caggtcacgc gcccactgcg atctcaggtt ccccgcgtgc 8940 agcagcgccc ggatccgcca gtggtgcgcg atcgcctcgt tgatcgggcg tacgccctgc 9000 accaggaact cacccagccg gtgccgcgtg ttccggttgg tcagcagcgc ctcccactgc 9060 tggaatctgg cgttgcgccg ctccagccgg gcctccacgc cacgcctccg cggcccttct 9120 ccgatcttgg acatggctga gacccttccc acgaacccgg cttgcgtgcc ctgcggcggg 9180 acaatcatgc cggtcgtccg cacgggccgg cgggccgggg acaagtgtcg gcgtcggctg 9240 gggtggcacc cgccgtgttc tcggcggcgg ccccagcccg atgccggcga acgcatcgtg 9300 ctccgtcggc gggaaatacc acacgaagat ccgttccaca tctaggtgga attccagact 9360 agttgcgatg cggccatcat agagtcgtgg tccggtggac gaaggccggg gcggctccga 9420 gctgcggtga tgatcaacat gaattgcgag gaggagaatt catgcggaca ccggacatgt 9480 tcatcggcgg tgtcgggacg ttcattccgc cgcgggtgag cgtcgactgg gcggtcgccc 9540 ggggcctcta ttgggccgag gacgccgagg cgcacgaact cgtcggcgtc gcggtcgcgg 9600 gcgacatgcc tccccccgag atggcactcc gggccgcaca gcaggcggtc aagcggtggg 9660 gcgggtcgcc gaaggagttc gacctgctgc tgtacgccag cacgtggcac cagggaccgg 9720 acggctggcc gccgcagtcg tacctgcaac ggcatctggt gggcggcgac ctgctcgccc 9780 tggagatccg gcagggctgc aacggtctgt tcagcgcgat ggaactcgcc gccagctacc 9840 tgaccgccgt tccggaacgc acgagcgccc tgctcgtcgc ggcggacaac tacggcacgc 9900 cgctgatcga ccgctggtcg atgggacccg gcttcatcgg tggcgacgcc gcctcggcca 9960 tcgtgctgac caaacaaccg gggttcgccc ggctgcgttc ggtgtgcaca cggacgatga 10020 cgaccgccga agccctgcac cgcggcgacg agccgctgtt cccgcccagc atcacggtcg 10080 gccgcaccac ggacttcagc gcccggatcg gccagcagtt cgccagccgc agcccggcgg 10140 ccgcagccat ggccgacgtg ccgcagcggg tcgtcgagct ggtcgaccag gcgctggcgg 10200 aggccgagat cgggatcggc gacatcgccc gggtggggtt catgaactac tcccgcgagg 10260 tggtcgagca gcgggtgatg acgatgtggg acctgccgat gtcgcgttcg acctgggagt 10320 acggtcgcgg gatcgggcac tgcggcgcca gcgacaccat cctgtccttc gatcacctgg 10380 tgcgcacggg ggagctccgg ccgggcgacc acatgttgat gctgggcacc gcacccggcg 10440 tcgtgctgtc ctgtgtcatc gtccaggtcc tcgaatcgcc ggcctggacg aagtgacgcc 10500 gggcaggcgg gggacccccg ccccggcgtc gggtctgcgg cggtggcccg gaccacgacg 10560 gccgacggcc gtgggcccgc tccgcccgtt ccggaggccg gagcgtccag gtgcccgccg 10620 gcacctggac gctcacaccg agggcgggtg gtccacgtcg cctacttctg gtcgcggcgc 10680 aggatcaggt agcagacccc gtcctcgttg acgaactcgt ccagcagcgg cgtcaggccg 10740 gcctcgacgg ccaacgagga gatcacgtcg atgccgtgga aggagaggaa acccttgaac 10800 tcgaccgggt tcgccgtcag gtaccgctcc gcgtagaagt ggaagaagct gcgcgtcgac 10860 gcgccgatgt cgaggaagtt gaccccgaac ctgccgccgg gccgcaggat ccgggcgatc 10920 tgacggaagt agtggaagaa ctcgaagacg ttgaggtgaa tgaacacgtt cagcgaaaag 10980 cccgcgtcga actccgccga cggcaacgcc gccaggtagt cgttgtcgat gtggtggtag 11040 tcgacgttgg catggtcctg gcaggtgacc cgtgccttgt ccaggaagga ccggctgacg 11100 tcggtgcaca gcatccgtcg caccgaaggc gcgaggacgt tggccatgat cccctcgccg 11160 ctgccgatct cgaagatcga cgactccggg gtgatcccga ggcgctcggc catccacttc 11220 gcccggtcga cgcggtcctg caggtactcg tcgcggggct gggtgccggc gagctggatc 11280 tgcatctcgt ccggcgtcct ccactcccag accatgttga ggtcgcccat gctccgcagc 11340 ggcggcttgg ggccggtggc gggtgttcct tgcgcgttgc tcatcagacc tcgctcacgg 11400 tgtcctgggt gatttcctta cgttgcggcc ctcgcccggt caccaacccg gcgacgtcgg 11460 aggcggtcag ctcaccctcg cgggccaggg tgacgagcag gtcggcgacc tgcgcggcgg 11520 tcggcgcggc accgacggac tcgctgagct cgaccgcccg gcgccggtac cggtggtcga 11580 acagcacctc accgatggcc ttgtcgatcg cgtcgcggtc gatcagcagc ccgggcaacg 11640 tcttggtcgc tccctgcgga tccagccgcc gaccgtagat ctgcccgtcg aagttcagcg 11700 cgagtgacag ctgcggcacc cccatggcga tgccgttcat caggcagttc gcgctgccgt 11760 ggtggatgag caggtcgcag tcgggcagga tcagttcgag tgggcagttg cgcaggaccc 11820 ggacgttcgg tggcagcgtg cccatcgcgt ccacctcgga cagcgccgcc gtgagcacca 11880 cctcggtggc cagttgcgcc gcggtctcga ccgcctgtcg aagggccggc agccgctcgc 11940 cgaacacccc cgtcgcggag ttgccccaca ccacgcacac ccgcttgccc ttgaccgggc 12000 ccaacagcca ggggtcgacg tcctgagatc cgttgaaggg gtggtatcgg atgggtatcc 12060 gcagcgcgtc gcccatgggc ggaatggcca cgtccggcga cggatcgacg gcgtacttga 12120 tgtcgcgccg ggtccactgc acgccgtact tctcgaagca ggagagggga tcccccgcca 12180 tcatgttgag ccccggctcg gtctccaccg tgccgatgaa gcccggcccg aagaagacgc 12240 tgggcacgtc gttcaaaatg ccgaccagcg ccccctcgac cgccatgatg tcgtacacca 12300 ccaggtcggg acgccaggac gcggcgtagt cgacggcgtt gtcgaagctg cgctggaccg 12360 cggcgatgga cctcttccag aagtcgctca gcatgccggt gtcgaaatcg cgcacggagc 12420 cgagcgcctc accggtgaag ggatgcagcg gcagaggcat ctccccgctc tgcggcgggg 12480 tgttgatcgc caacgaccag taggccagcc gggcgctttc catcatgtcg gcggagtcga 12540 gcatcgacac cggcatcagg ccggtcgcct ggacccccga aacctgctgg ggcgggcagg 12600 cgacccggac ctcgtgcccg gccgcccgga acgcccatgc gagcggaacc atgcacatgt 12660 agtgtccagc ccagttggac acggtaaaca gaatccgcat cggaaccttt ccctagcgcc 12720 gtacctgcac gggtcgcttg ttcacgtgcc gagcccgatc accacacaag cgcgaatcga 12780 ccggcccggc gcgacaggct ccgctgcggt cggcggctgc ccgaccgaga gtagcggacc 12840 tggactagcg ttttccccac acctgatctt cggcggcaag gaaacgcctc gcatatgcat 12900 caaccattct tcgctctggg ccaggaactg tcgcggcacc gtacgaaatc gttgcggagg 12960 tcgtcgttca ggacaccccg tcgtccacca ggcgggtcgg atcgagcgag ttcatgaacg 13020 aacgcagggc gtcgagatgg gtcgcgggct tgtcccagcc gagctcggcg caccaggcga 13080 tcaggtccgt cagctcctgg gtggcgcgcg ccttctcctc ggccgacagg ctggcgaccg 13140 agagatcctg cggccactgc accacgttgg ccaggtccac ctccagcccc tcggtacgcg 13200 cgaactcgag cacgttgcgc aggtcccaca ggttgtgccg ctgcggggac acctggagcc 13260 agacgtcgaa gtcggaccgg agcaggcgca gattggccac gaagtccgcc cacttcccgc 13320 cggcccggat gtattcgaac acctcgccga cgccgtcgca ggaagccccg atgcccacgc 13380 tcttgaagtg ccgtaggagc tttatcgcgt tgtccgggga gacggtcagg ttcgagttgt 13440 actggatgtc gacgttgtgc gcgttcccgg tttccacgag cagctcgagc atggcgaaat 13500 gacccggttg caggaagggt tcgccgcccg cgaagtacag cttgcggatc aggtgcgcat 13560 tctcccgcag cgtcgcccac aactcgtcgt cgtcgcggta cgggtcgatg accgcggacg 13620 accacgacgg gcgttgcttg gcgccccagg aggaactcac cgggtaggtg cacatgacgc 13680 accgcaggtt gcagaggttg ccgaacctga tgtcgagaaa gaacgggaac tcctcgaccg 13740 tgccgtccgc ggcggtacgg gcggcgagcg catcgaggtc gtactcctgg tggaaccggc 13800 ggttgacgtt ctgccggtag gactgggcgc cgtggtcctc ccggaagtag cagtacttgc 13860 acgcctccac gcgctcgcca ccgagcatcg ccagccgggt ccgcttcatg ttggggctgt 13920 tgaaggcctc ccggatgccc atcacgcggt ccgggttgtc cttggcgtac cgcgagcccg 13980 gggagcaacc gatcgcgtcg tcgttcagcg cgaacgccgg ctcctcctgc tcgtcgtaca 14040 gctccgtgtg gtacatcgag tcgtcgacgc agcaccggcc gtagacgccg tcgatggagg 14100 cgcagaggtg gatccacggc agcacacacg cggtctgatc ggccgtcggg gacggggtgg 14160 cgtggctgtc gcccggaacg ctcatcggat gccccccgag ctcaccatcg ccagtactcc 14220 tcgtgcgcga agcgcagcgt gtcgatctcc gg 14252 <210> SEQ ID NO 9 <211> LENGTH: 274 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 9 Val Ile Gly Leu Leu Gly Arg Leu Pro Gly Val Asn Ala Val Leu Gly 1 5 10 15 Ala Val Ser Lys Gln Gln Ala Glu Pro Thr Leu Asp Glu Val Met Ala 20 25 30 Glu Arg Phe Arg Glu Arg Thr Asp Pro Arg Arg Gly Asp Trp Ala Tyr 35 40 45 Ala His Phe Ile Asp Leu Arg Asp Ala Leu Ala Glu Val Leu Gly Asp 50 55 60 Ala Ser Gly Asn Trp Leu Asp Tyr Gly Ala Gly Thr Ser Pro Tyr Arg 65 70 75 80 Asn Leu Phe Thr Ala Ala Asp Leu Lys Thr Ala Asp Ile Pro Gly Gly 85 90 95 Glu Ser Tyr Pro Ala Asp Tyr Ala Leu Asp His Asp Gly Arg Cys Pro 100 105 110 Ala Pro Asp Ala Thr Phe Asp Gly Val Leu Ser Thr Gln Val Leu Glu 115 120 125 His Val Thr Asp Ala Asp Ala Tyr Leu Arg Glu Ala Leu Arg Leu Leu 130 135 140 Arg Pro Gly Gly Arg Leu Val Leu Ser Thr His Gly Val Trp Glu Glu 145 150 155 160 His Gly Gly Gln Asp Leu Trp Arg Trp Thr Ala Asp Gly Leu Ala Arg 165 170 175 Gln Ala Glu Leu Ala Gly Phe Ala Val Asp Arg Val Leu Lys Leu Thr 180 185 190 Cys Gly Pro Arg Gly Leu Leu Leu Leu Leu Arg Trp Tyr Gly Arg Glu 195 200 205 Asn Gly Trp Pro Ala Ile Gly Pro Val Gly Leu Val Leu Arg Ser Leu 210 215 220 Trp Leu Val Asp His Leu Leu Pro Ser Ser Leu Asp Thr Tyr Leu Asp 225 230 235 240 Arg Ala Phe Gly Asp Leu Gly Arg Arg Glu Gly Pro Asp Ala Pro Phe 245 250 255 Tyr Leu Asp Leu Leu Leu Val Ala Arg Lys Pro His Thr Lys Glu Thr 260 265 270 Ala Thr <210> SEQ ID NO 10 <211> LENGTH: 429 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 10 Val Ser Arg Thr Ala Ser Ala Tyr Asp Glu Ser Val Val Arg Gln Val 1 5 10 15 Asn Ala Arg Thr Asp Cys Arg Val Cys Gly Gly Thr Leu Arg Thr Ile 20 25 30 Leu Asp Leu Gly Asp Gln Tyr Leu Gln Gly Ser Phe Val Lys Pro Gly 35 40 45 Thr Pro Glu Pro Pro Ala Val Lys Phe Pro Leu Glu Leu Thr Arg Cys 50 55 60 Val Gly Asp Cys Gly Leu Val Gln Leu Arg His Thr Leu Pro Pro Gly 65 70 75 80 Leu Leu Tyr Asp Thr Tyr Trp Tyr Arg Ser Arg Ile Asn Asp Thr Met 85 90 95 Arg Thr His Leu Arg Glu Ile Ala Glu Ser Gly Val Ala Ala Leu Gly 100 105 110 Arg Pro Leu Arg Arg Ala Leu Asp Ile Gly Cys Asn Asp Gly Thr Leu 115 120 125 Leu Gln Asn Leu Arg Gly Ala Glu Leu Trp Gly Ile Asp Pro Ser Asn 130 135 140 Ala Thr Asp Asp Ala Pro Glu Gly Ile Thr Leu Val Arg Asp Phe Phe 145 150 155 160 Pro Ser Pro Ala Leu Asp Glu His Ala Gly Thr Phe Asp Val Val Thr 165 170 175 Ser Ile Ala Met Phe Tyr Asp Val Glu Asp Pro Val Ala Phe Ala Arg 180 185 190 Ala Val Glu Arg Met Leu Ala Pro Gly Gly Val Trp Val Val Glu Val 195 200 205 Ala Tyr Leu Arg Glu Met Leu Ala Thr Thr Gly Tyr Asp Ser Ile Cys 210 215 220 His Glu His Leu Ser Tyr Tyr Ser Leu Ser Thr Leu Thr Phe Ile Leu 225 230 235 240 Arg Gln Ala Gly Leu Glu Ile Arg Arg Ala Ser Val Asn Gly Met Asn 245 250 255 Gly Gly Ser Ile Cys Cys Val Val Thr Arg Ala Thr Glu Gly Ala Asp 260 265 270 His Ala Asp Gly Ser Val Ala Glu Leu Ala Ala Gln Glu Arg Glu Leu 275 280 285 Gly Leu Asp Gln Ser Glu Pro Tyr Glu Arg Phe Ala Asp Asn Val Arg 290 295 300 Ala His Arg Asp Glu Leu Val Lys Met Leu His Gly Leu Arg Asp Ser 305 310 315 320 Gly Ser Thr Val His Val Tyr Gly Ala Ser Thr Lys Gly Asn Thr Leu 325 330 335 Leu Gln Tyr Cys Gly Ile Asp Arg Thr Leu Ile Pro Tyr Ala Ala Glu 340 345 350 Arg Asn Pro Asp Lys Val Gly Ala Arg Thr Leu Gly Thr Asp Ile Glu 355 360 365 Ile Ile Ser Glu Ala Asp Ser Arg Ala Arg Arg Pro Asp His Tyr Leu 370 375 380 Val Leu Pro Trp His Phe His Asp Glu Ile Val Ala Arg Glu Ala Ala 385 390 395 400 Thr Val Ala Ala Gly Thr Lys Leu Ile Phe Pro Leu Pro Ser Leu Arg 405 410 415 Val Val Gln Ala Ser Arg Thr Asp Ser Arg Val Gly Ser 420 425 <210> SEQ ID NO 11 <211> LENGTH: 357 <212> TYPE: PRT <213> ORGANISM: M.carbonacea <400> SEQUENCE: 11 Val Ala Gly Arg Thr Val Leu Tyr Pro Gly Pro Ala Thr Pro Leu Thr 1 5 10 15 Val Gln Ile Asp Val Asp Val Ala Asp Ala Arg Gln Ile Ser Tyr Leu 20 25 30 Leu Ala Ala Gly Pro His Gly Ala Gln Ala Arg Pro Gly Arg Thr Asp 35 40 45 Asp Pro Trp Val Arg Val Arg Tyr Asp Leu Ala Ala Leu Val Arg Asp 50 55 60 Val Phe Gly Pro Ala Gly Pro Trp Thr Gly Thr Gly Arg Asp Val Val 65 70 75 80 Met Lys Asp Glu Pro Gly Pro Val Glu Tyr Lys Pro Asp Asp Pro Trp 85 90 95 Leu Val Arg Arg Glu Glu Ala Thr Arg Ala Ala Tyr Gln Ala Leu Arg 100 105 110 Ala Cys Glu Pro Tyr Arg Gly Asp Leu Ala Ala Leu Ala Leu Arg Phe 115 120 125 Gly Ser Asp Lys Trp Gly Gly His Trp Tyr Thr Ser His Tyr Glu Arg 130 135 140 His Leu Gly Gly Phe Arg Asp His Arg Leu Asn Leu Leu Glu Ile Gly 145 150 155 160 Ile Gly Gly Tyr His Glu Pro Asp Ala Gly Gly Ala Ser Leu Arg Met 165 170 175 Trp Lys His Tyr Phe His Arg Gly Ser Val Tyr Gly Leu Asp Val Tyr 180 185 190 Asp Lys Ser Leu Leu Asp Glu Pro Arg Leu Thr Thr Leu Arg Gly Asp 195 200 205 Gln Ala Asp Pro Ala Met Leu Ala Asp Leu Ala Arg Arg His Gly Pro 210 215 220 Phe Asp Ile Val Ile Asp Asp Gly Ser His Val Ser Ser His Val Ile 225 230 235 240 Thr Ala Phe Gln Ala Leu Phe Pro His Val Arg Pro Gly Gly Val Tyr 245 250 255 Val Ile Glu Asp Leu His Thr Ser Tyr Trp Pro Glu Trp Gly Gly Asn 260 265 270 Gly Thr Asp Leu Ser Asp Pro Ala Thr Ser Val Gly Phe Leu Lys Thr 275 280 285 Leu Val Asp Gly Leu His His Arg Asp Arg Leu His Asp Gly Pro Tyr 290 295 300 Gln Pro Thr Tyr Pro Asp Leu Thr Val Thr Gly Leu His Leu Tyr His 305 310 315 320 Asn Leu Ala Phe Val Glu Lys Gly Arg Asn Thr Glu Gln Ala Asn Ala 325 330 335 Thr Trp Arg Pro Arg Asn Asp Pro Met Arg Asp Leu Pro Lys Pro Gln 340 345 350 Arg Ser Ala Gly Glu 355 <210> SEQ ID NO 12 <211> LENGTH: 292 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 12 Val Phe Leu Leu Gly Ala Pro Gly Thr Thr Pro Val Ala Pro Ala Val 1 5 10 15 Thr Val Val Asp Ala Gly Glu Ile Glu Asp Met His Ala Trp Leu Asn 20 25 30 Gly Pro Glu Ala Ala Thr Ile Asp Val Val His Asp Phe Ser Cys Gly 35 40 45 Gln Ile Asp Pro Asp Arg Leu Pro Arg Gly Met Ala Tyr Leu Ser Thr 50 55 60 His His Leu Thr Gly Lys Pro Lys Tyr Pro Arg Asn Cys Val Tyr Ala 65 70 75 80 Ser Tyr Ala Gln Arg Ala Gln Ala Glu Asn Asp Val Ala Pro Val Val 85 90 95 Arg Ile Ser Val Asn Gln Ala Arg Tyr Pro Phe Arg Ala Asp Lys Asp 100 105 110 Asp Tyr Leu Leu Tyr Leu Gly Arg Ile Ser Glu Trp Lys Gly Thr Tyr 115 120 125 Glu Ala Ala Ala Phe Ala Ser Ala Ala Gly Arg Arg Leu Val Val Ala 130 135 140 Gly Pro Ser Trp Glu Glu Asp Tyr Leu Ala Arg Ile Leu Arg Asp Phe 145 150 155 160 Gly Asp Ser Val Asp Leu Val Gly Glu Val Gly Gly Asp Arg Arg Leu 165 170 175 Asp Leu Ile Ser Arg Ala Thr Ala Met Met Val Leu Ser Gln Ser Thr 180 185 190 Met Gly Pro Trp Gly Val Val Trp Cys Glu Pro Gly Ser Thr Val Val 195 200 205 Ser Glu Ala Ala Ala Cys Gly Thr Pro Val Ile Gly Thr Pro Asn Gly 210 215 220 Cys Leu Ala Glu Ile Val Pro Ala Val Gly Thr Val Val Pro Glu Gly 225 230 235 240 Ala Asp Phe Thr Val Glu Gln Ala Arg Ser Val Val Ala Ala Leu Pro 245 250 255 Gly Pro Asp Ala Val Arg Ala Ala Ala Leu Glu Arg Trp Asp His Val 260 265 270 Val Val Ala Lys Glu Phe Glu Ala Ile Tyr His Asp Val Leu Ala Gly 275 280 285 Arg Thr Trp Thr 290 <210> SEQ ID NO 13 <211> LENGTH: 137 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 13 Val Lys Ile Glu Val Leu Gln Pro Ser Cys Asn Leu Asp Thr Val Arg 1 5 10 15 Asp Gly Arg Gly Gly Ile Phe Thr Trp Val Pro Pro Glu Pro Ile Leu 20 25 30 Glu Phe Asn Leu Ile Thr Met His Pro Gly Lys Val Arg Gly Leu His 35 40 45 Tyr His Pro His Phe Val Glu Tyr Leu Leu Phe Val Asp Gly Glu Gly 50 55 60 Val Leu Val Thr Lys Asp Asp Pro Asp Asp Pro Asp Cys Pro Glu Glu 65 70 75 80 Phe Ile His Val Ala Arg Gly Thr Cys Thr Arg Thr Pro Ser Gly Val 85 90 95 Met His Ala Val Tyr Ser Ile Thr Ser Leu Ser Phe Val Ala Met Leu 100 105 110 Thr Arg Pro Trp Asp Glu Cys Asp Pro Pro Ile Val Gln Val Gln Pro 115 120 125 Leu Pro His Thr Leu Ala Ala Asn Gly 130 135 <210> SEQ ID NO 14 <211> LENGTH: 314 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 14 Val Ser Asp Asn Arg Val Ile Val Phe Gly Gly Thr Gly Phe Leu Gly 1 5 10 15 Arg Gln Val Ala Lys Asn Leu Val Ala Ala Gly His Asp Val Leu Val 20 25 30 Val Ala Arg Asn Ala Pro Arg Ala Thr Thr Gly Tyr Arg Phe Arg Ala 35 40 45 Ile Asp Val Ser Gly Val Arg Pro Gly Glu Leu Ala Ala Met Leu Ala 50 55 60 Ala Glu Arg Pro Ala Ala Ile Val Asn Ala Thr Gly Gly Lys Trp Gly 65 70 75 80 Leu Thr Gly Arg Gly Leu Glu Ala Ser Cys Val Gly Ala Thr Glu Ala 85 90 95 Ile Leu Thr Ala Leu Ala Met Thr Ser Leu Val Pro Arg Phe Val His 100 105 110 Leu Gly Ser Val Leu Glu Cys Gly Leu Ala Ala Pro Asp Ala Pro Gly 115 120 125 Ala Ala Gln Arg Ser Ser Arg Pro Ala Ser Glu Tyr Asp Arg Phe Lys 130 135 140 Leu Ala Ala Thr Glu Ala Val Leu Glu Ala Ala Ala Gln Gly Thr Val 145 150 155 160 Asp Pro Val Val Leu Arg Leu Ala Asn Val Thr Gly Pro Gly Val Pro 165 170 175 Pro Ala Ser Leu Leu Gly Leu Val Ala Gly Ser Leu Val Glu Ala Ala 180 185 190 Arg Arg Gly Gly His Ala Asn Ile Glu Leu Thr Ala Leu Asp Ala Arg 195 200 205 Arg Asp Tyr Val Asp Val Arg Asp Val Ala Glu Ala Ile Arg Ala Ala 210 215 220 Ile Arg Val Pro Gly Thr Thr Val Pro Ile Ala Ile Gly Arg Gly Glu 225 230 235 240 Ser Val Ser Val Arg Thr Leu Val Ala Met Leu Val Asp Ile Ser Gln 245 250 255 Val Pro Ala Thr Val Val Glu Leu Pro Ala Pro Ala Ala Gly Ala Glu 260 265 270 Asp Trp Thr Arg Val Asp Leu Arg Pro Ala Arg Glu Leu Leu Gly Trp 275 280 285 Thr Pro Arg Arg Thr Leu Ser Glu Ala Ile Gly Ala Leu Trp Arg His 290 295 300 Ala Leu Glu Gly Asp Pro Val Glu Ser Arg 305 310 <210> SEQ ID NO 15 <211> LENGTH: 285 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 15 Met Leu Arg Leu Leu Arg Ala Leu Ala Ala Leu Asp Leu Leu Ala Glu 1 5 10 15 Pro Arg Pro Gly Arg Phe Thr Val Thr Pro Val Gly Ala Leu Phe Arg 20 25 30 Ser Asp Arg Pro Gly Ser Met Tyr Pro Leu Ala Arg Met Leu Thr Asp 35 40 45 Pro Thr Met Thr Ser Ala Trp Gln Asn Leu Glu Phe Ser Leu Arg Thr 50 55 60 Gly Gly Pro Ala Phe Asp Glu Ala Phe Gly Ile Asp Phe Phe Gly Tyr 65 70 75 80 Leu Ser Ser His Pro Glu Leu Ser Glu Leu Tyr Asn Ala Ala Met Ser 85 90 95 Gln Gly Thr Arg Gly Val Ala Arg Val Leu Ala Gly Ala Tyr Asp Phe 100 105 110 Gly Arg Phe Arg Thr Val Val Asp Val Gly Gly Gly Asp Gly Thr Ser 115 120 125 Leu Val Glu Ile Leu Ala Glu His Pro Arg Leu Gly Gly Val Leu Phe 130 135 140 Asp Ser Pro Ser Gly Val His Ala Ala Glu Gln Thr Leu Glu Ala Ala 145 150 155 160 Gly Leu Thr Ala Arg Cys Arg Ile Glu Thr Gly Asp Phe Phe Ser Glu 165 170 175 Val Pro Arg Asp Gly Asp Leu Tyr Leu Leu Lys Ser Val Ile His Gly 180 185 190 Trp Asp Asp Glu His Ala Ala Val Ile Leu Arg Asn Cys Ala Arg Ala 195 200 205 Ala Arg Glu Gln Gly Arg Ile Leu Leu Val Asp His Leu Met Pro Asp 210 215 220 Thr Val Leu Pro Gly Gln Ser Pro Thr Thr Tyr Leu Thr Asp Leu Gly 225 230 235 240 Leu Leu Val Asn Gly Gln Gly Met Glu Arg Thr Arg Asp Asp Phe Ala 245 250 255 Gly Leu Cys Ala Lys Ala Gly Leu Arg Ile Ala Glu Val Gly Ser Leu 260 265 270 Pro Ser Thr Gly Phe His Trp Ile Glu Leu Cys Pro Asp 275 280 285 <210> SEQ ID NO 16 <211> LENGTH: 276 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 16 Met Leu Thr Ala Glu Gln Ile Glu Ser Phe Val Ala Asp Gly Phe Val 1 5 10 15 Arg Val Pro Asn Ala Phe Pro Ala Ala Leu Ala Ala Glu Cys Arg Asn 20 25 30 Leu Leu Trp Lys Gln Leu Asp Val Asp Pro Asp Asp Ser Ser Thr Trp 35 40 45 Thr Arg Glu Val Val Arg Leu Gly Leu Arg Gly Asp Asp Ala Phe Val 50 55 60 Gln Ser Ala Asn Thr Pro Ala Leu Val Glu Ala Tyr Asp Gln Leu Val 65 70 75 80 Gly Ala Gly Arg Trp Arg Pro Leu Asp Met Val Gly Thr Phe Pro Ile 85 90 95 Arg Phe Pro Val Asp Arg Asp Pro Glu Gln Ala Glu Asp Tyr Gly Trp 100 105 110 His Ile Asp Ala Ser Phe Leu Ser Pro Glu Gly Val Ala Ala Met Ser 115 120 125 Ser Gly Gln Asp Trp Glu Gly Glu Leu Pro Leu Val Pro Pro Asp Tyr 130 135 140 Asp Arg Ile Phe Arg Ser Asn Leu Val Ser Arg Gly Arg Ala Leu Leu 145 150 155 160 Val Leu Leu Leu Tyr Ser Asp Thr Gly Glu Arg Asp Ala Pro Thr Leu 165 170 175 Ile Arg Val Gly Ser His Leu Asp Val Pro Pro Leu Leu Ala Pro Tyr 180 185 190 Gly Ala Glu Gly Thr Tyr Leu Ala Cys Arg Asp Val Gly Ala Asp Arg 195 200 205 Pro Leu Ala Met Ala Thr Gly Arg Ala Gly Asp Ala Tyr Leu Cys His 210 215 220 Pro Phe Leu Val His Thr Pro Ile Thr Asn Thr Gly Thr Ser Pro Arg 225 230 235 240 Phe Met Ala Gln Pro Ser Leu Gln Pro Thr Gly Glu Phe Asp Leu Asp 245 250 255 Arg Ala Asp Gly Gln Tyr Val Pro Val Glu Arg Ala Ile Arg Ala Gly 260 265 270 Leu Ala Arg Gly 275 <210> SEQ ID NO 17 <211> LENGTH: 265 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 17 Val Glu Ala Arg Leu Glu Arg Arg Asn Ala Arg Phe Gln Gln Trp Glu 1 5 10 15 Ala Leu Leu Thr Asn Arg Asn Thr Arg His Arg Leu Gly Glu Phe Leu 20 25 30 Val Gln Gly Val Arg Pro Ile Asn Glu Ala Ile Ala His His Trp Arg 35 40 45 Ile Arg Ala Leu Leu His Ala Gly Asn Leu Arg Ser Gln Trp Ala Arg 50 55 60 Asp Leu Val Arg Glu Gln Val Ala Asp Glu Val Ile Arg Leu Ser Pro 65 70 75 80 Glu Leu Leu His Glu Leu Ala Gly Lys Asp Glu Asp Thr Thr Glu Leu 85 90 95 Ile Ala Val Val Ala Ile Pro Pro Asp Asp Leu Thr Arg Ile Arg Val 100 105 110 Arg Pro Asn Gly Val Leu Val Val Leu Asp Arg Pro Ile Ser Pro Gly 115 120 125 Asn Val Gly Ser Leu Leu Arg Ser Ala Asp Ala Leu Gly Ile Asp Gly 130 135 140 Val Ile Val Ala Gly Arg Ala Ala Asp Leu Tyr Asp Pro Lys Thr Val 145 150 155 160 Arg Gly Ser Arg Gly Ser Leu Phe Ala Val Pro Ala Val Arg Ala Glu 165 170 175 Thr Pro Thr Ala Val Leu Glu Trp Leu Arg Thr Ile Asp Ala Met Thr 180 185 190 Leu Val Gly Thr Ser Glu Asp Ala Val Thr Asp Ile Trp Asn His Asp 195 200 205 Phe Thr Gly Pro Thr Ala Val Val Val Gly Asn Glu Thr Ser Gly Met 210 215 220 Ser Ser Phe Trp Ala Asn Asn Cys Asp Val Val Leu Arg Ile Pro Met 225 230 235 240 Val Gly Ser Ala Ser Ser Leu Asn Ala Thr Val Ala Ala Ser Ile Thr 245 250 255 Leu Tyr Glu Ile Thr Arg Gln Arg Ala 260 265 <210> SEQ ID NO 18 <211> LENGTH: 344 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 18 Met Arg Thr Pro Asp Met Phe Ile Gly Gly Val Gly Thr Phe Ile Pro 1 5 10 15 Pro Arg Val Ser Val Asp Trp Ala Val Ala Arg Gly Leu Tyr Trp Ala 20 25 30 Glu Asp Ala Glu Ala His Glu Leu Val Gly Val Ala Val Ala Gly Asp 35 40 45 Met Pro Pro Pro Glu Met Ala Leu Arg Ala Ala Gln Gln Ala Val Lys 50 55 60 Arg Trp Gly Gly Ser Pro Lys Glu Phe Asp Leu Leu Leu Tyr Ala Ser 65 70 75 80 Thr Trp His Gln Gly Pro Asp Gly Trp Pro Pro Gln Ser Tyr Leu Gln 85 90 95 Arg His Leu Val Gly Gly Asp Leu Leu Ala Leu Glu Ile Arg Gln Gly 100 105 110 Cys Asn Gly Leu Phe Ser Ala Met Glu Leu Ala Ala Ser Tyr Leu Thr 115 120 125 Ala Val Pro Glu Arg Thr Ser Ala Leu Leu Val Ala Ala Asp Asn Tyr 130 135 140 Gly Thr Pro Leu Ile Asp Arg Trp Ser Met Gly Pro Gly Phe Ile Gly 145 150 155 160 Gly Asp Ala Ala Ser Ala Ile Val Leu Thr Lys Gln Pro Gly Phe Ala 165 170 175 Arg Leu Arg Ser Val Cys Thr Arg Thr Met Thr Thr Ala Glu Ala Leu 180 185 190 His Arg Gly Asp Glu Pro Leu Phe Pro Pro Ser Ile Thr Val Gly Arg 195 200 205 Thr Thr Asp Phe Ser Ala Arg Ile Gly Gln Gln Phe Ala Ser Arg Ser 210 215 220 Pro Ala Ala Ala Ala Met Ala Asp Val Pro Gln Arg Val Val Glu Leu 225 230 235 240 Val Asp Gln Ala Leu Ala Glu Ala Glu Ile Gly Ile Gly Asp Ile Ala 245 250 255 Arg Val Gly Phe Met Asn Tyr Ser Arg Glu Val Val Glu Gln Arg Val 260 265 270 Met Thr Met Trp Asp Leu Pro Met Ser Arg Ser Thr Trp Glu Tyr Gly 275 280 285 Arg Gly Ile Gly His Cys Gly Ala Ser Asp Thr Ile Leu Ser Phe Asp 290 295 300 His Leu Val Arg Thr Gly Glu Leu Arg Pro Gly Asp His Met Leu Met 305 310 315 320 Leu Gly Thr Ala Pro Gly Val Val Leu Ser Cys Val Ile Val Gln Val 325 330 335 Leu Glu Ser Pro Ala Trp Thr Lys 340 <210> SEQ ID NO 19 <211> LENGTH: 240 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 19 Met Ser Asn Ala Gln Gly Thr Pro Ala Thr Gly Pro Lys Pro Pro Leu 1 5 10 15 Arg Ser Met Gly Asp Leu Asn Met Val Trp Glu Trp Arg Thr Pro Asp 20 25 30 Glu Met Gln Ile Gln Leu Ala Gly Thr Gln Pro Arg Asp Glu Tyr Leu 35 40 45 Gln Asp Arg Val Asp Arg Ala Lys Trp Met Ala Glu Arg Leu Gly Ile 50 55 60 Thr Pro Glu Ser Ser Ile Phe Glu Ile Gly Ser Gly Glu Gly Ile Met 65 70 75 80 Ala Asn Val Leu Ala Pro Ser Val Arg Arg Met Leu Cys Thr Asp Val 85 90 95 Ser Arg Ser Phe Leu Asp Lys Ala Arg Val Thr Cys Gln Asp His Ala 100 105 110 Asn Val Asp Tyr His His Ile Asp Asn Asp Tyr Leu Ala Ala Leu Pro 115 120 125 Ser Ala Glu Phe Asp Ala Gly Phe Ser Leu Asn Val Phe Ile His Leu 130 135 140 Asn Val Phe Glu Phe Phe His Tyr Phe Arg Gln Ile Ala Arg Ile Leu 145 150 155 160 Arg Pro Gly Gly Arg Phe Gly Val Asn Phe Leu Asp Ile Gly Ala Ser 165 170 175 Thr Arg Ser Phe Phe His Phe Tyr Ala Glu Arg Tyr Leu Thr Ala Asn 180 185 190 Pro Val Glu Phe Lys Gly Phe Leu Ser Phe His Gly Ile Asp Val Ile 195 200 205 Ser Ser Leu Ala Val Glu Ala Gly Leu Thr Pro Leu Leu Asp Glu Phe 210 215 220 Val Asn Glu Asp Gly Val Cys Tyr Leu Ile Leu Arg Arg Asp Gln Lys 225 230 235 240 <210> SEQ ID NO 20 <211> LENGTH: 438 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 20 Met Arg Ile Leu Phe Thr Val Ser Asn Trp Ala Gly His Tyr Met Cys 1 5 10 15 Met Val Pro Leu Ala Trp Ala Phe Arg Ala Ala Gly His Glu Val Arg 20 25 30 Val Ala Cys Pro Pro Gln Gln Val Ser Gly Val Gln Ala Thr Gly Leu 35 40 45 Met Pro Val Ser Met Leu Asp Ser Ala Asp Met Met Glu Ser Ala Arg 50 55 60 Leu Ala Tyr Trp Ser Leu Ala Ile Asn Thr Pro Pro Gln Ser Gly Glu 65 70 75 80 Met Pro Leu Pro Leu His Pro Phe Thr Gly Glu Ala Leu Gly Ser Val 85 90 95 Arg Asp Phe Asp Thr Gly Met Leu Ser Asp Phe Trp Lys Arg Ser Ile 100 105 110 Ala Ala Val Gln Arg Ser Phe Asp Asn Ala Val Asp Tyr Ala Ala Ser 115 120 125 Trp Arg Pro Asp Leu Val Val Tyr Asp Ile Met Ala Val Glu Gly Ala 130 135 140 Leu Val Gly Ile Leu Asn Asp Val Pro Ser Val Phe Phe Gly Pro Gly 145 150 155 160 Phe Ile Gly Thr Val Glu Thr Glu Pro Gly Leu Asn Met Met Ala Gly 165 170 175 Asp Pro Leu Ser Cys Phe Glu Lys Tyr Gly Val Gln Trp Thr Arg Arg 180 185 190 Asp Ile Lys Tyr Ala Val Asp Pro Ser Pro Asp Val Ala Ile Pro Pro 195 200 205 Met Gly Asp Ala Leu Arg Ile Pro Ile Arg Tyr His Pro Phe Asn Gly 210 215 220 Ser Gln Asp Val Asp Pro Trp Leu Leu Gly Pro Val Lys Gly Lys Arg 225 230 235 240 Val Cys Val Val Trp Gly Asn Ser Ala Thr Gly Val Phe Gly Glu Arg 245 250 255 Leu Pro Ala Leu Arg Gln Ala Val Glu Thr Ala Ala Gln Leu Ala Thr 260 265 270 Glu Val Val Leu Thr Ala Ala Leu Ser Glu Val Asp Ala Met Gly Thr 275 280 285 Leu Pro Pro Asn Val Arg Val Leu Arg Asn Cys Pro Leu Glu Leu Ile 290 295 300 Leu Pro Asp Cys Asp Leu Leu Ile His His Gly Ser Ala Asn Cys Leu 305 310 315 320 Met Asn Gly Ile Ala Met Gly Val Pro Gln Leu Ser Leu Ala Leu Asn 325 330 335 Phe Asp Gly Gln Ile Tyr Gly Arg Arg Leu Asp Pro Gln Gly Ala Thr 340 345 350 Lys Thr Leu Pro Gly Leu Leu Ile Asp Arg Asp Ala Ile Asp Lys Ala 355 360 365 Ile Gly Glu Val Leu Phe Asp His Arg Tyr Arg Arg Arg Ala Val Glu 370 375 380 Leu Ser Glu Ser Val Gly Ala Ala Pro Thr Ala Ala Gln Val Ala Asp 385 390 395 400 Leu Leu Val Thr Leu Ala Arg Glu Gly Glu Leu Thr Ala Ser Asp Val 405 410 415 Ala Gly Leu Val Thr Gly Arg Gly Pro Gln Arg Lys Glu Ile Thr Gln 420 425 430 Asp Thr Val Ser Glu Val 435 <210> SEQ ID NO 21 <211> LENGTH: 405 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 21 Met Ser Val Pro Gly Asp Ser His Ala Thr Pro Ser Pro Thr Ala Asp 1 5 10 15 Gln Thr Ala Cys Val Leu Pro Trp Ile His Leu Cys Ala Ser Ile Asp 20 25 30 Gly Val Tyr Gly Arg Cys Cys Val Asp Asp Ser Met Tyr His Thr Glu 35 40 45 Leu Tyr Asp Glu Gln Glu Glu Pro Ala Phe Ala Leu Asn Asp Asp Ala 50 55 60 Ile Gly Cys Ser Pro Gly Ser Arg Tyr Ala Lys Asp Asn Pro Asp Arg 65 70 75 80 Val Met Gly Ile Arg Glu Ala Phe Asn Ser Pro Asn Met Lys Arg Thr 85 90 95 Arg Leu Ala Met Leu Gly Gly Glu Arg Val Glu Ala Cys Lys Tyr Cys 100 105 110 Tyr Phe Arg Glu Asp His Gly Ala Gln Ser Tyr Arg Gln Asn Val Asn 115 120 125 Arg Arg Phe His Gln Glu Tyr Asp Leu Asp Ala Leu Ala Ala Arg Thr 130 135 140 Ala Ala Asp Gly Thr Val Glu Glu Phe Pro Phe Phe Leu Asp Ile Arg 145 150 155 160 Phe Gly Asn Leu Cys Asn Leu Arg Cys Val Met Cys Thr Tyr Pro Val 165 170 175 Ser Ser Ser Trp Gly Ala Lys Gln Arg Pro Ser Trp Ser Ser Ala Val 180 185 190 Ile Asp Pro Tyr Arg Asp Asp Asp Glu Leu Trp Ala Thr Leu Arg Glu 195 200 205 Asn Ala His Leu Ile Arg Lys Leu Tyr Phe Ala Gly Gly Glu Pro Phe 210 215 220 Leu Gln Pro Gly His Phe Ala Met Leu Glu Leu Leu Val Glu Thr Gly 225 230 235 240 Asn Ala His Asn Val Asp Ile Gln Tyr Asn Ser Asn Leu Thr Val Ser 245 250 255 Pro Asp Asn Ala Ile Lys Leu Leu Arg His Phe Lys Ser Val Gly Ile 260 265 270 Gly Ala Ser Cys Asp Gly Val Gly Glu Val Phe Glu Tyr Ile Arg Ala 275 280 285 Gly Gly Lys Trp Ala Asp Phe Val Ala Asn Leu Arg Leu Leu Arg Ser 290 295 300 Asp Phe Asp Val Trp Leu Gln Val Ser Pro Gln Arg His Asn Leu Trp 305 310 315 320 Asp Leu Arg Asn Val Leu Glu Phe Ala Arg Thr Glu Gly Leu Glu Val 325 330 335 Asp Leu Ala Asn Val Val Gln Trp Pro Gln Asp Leu Ser Val Ala Ser 340 345 350 Leu Ser Ala Glu Glu Lys Ala Arg Ala Thr Gln Glu Leu Thr Asp Leu 355 360 365 Ile Ala Trp Cys Ala Glu Leu Gly Trp Asp Lys Pro Ala Thr His Leu 370 375 380 Asp Ala Leu Arg Ser Phe Met Asn Ser Leu Asp Pro Thr Arg Leu Val 385 390 395 400 Asp Asp Gly Val Ser 405 <210> SEQ ID NO 22 <211> LENGTH: 14186 <212> TYPE: DNA <213> ORGANISM: M.carbonacea <220> FEATURE: <221> NAME/KEY: misc_feature <222> LOCATION: (7)..(891) <223> OTHER INFORMATION: ORF 18 (negative strandedness) incomplete: N-terminus only (C-terminus undetermined) <221> NAME/KEY: misc_feature <222> LOCATION: (894)..(1622) <223> OTHER INFORMATION: ORF 19 (negative strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (1622)..(3067) <223> OTHER INFORMATION: ORF 20 (negative strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (3382)..(4521) <223> OTHER INFORMATION: ORF 21 (positive strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (4602)..(5576) <223> OTHER INFORMATION: ORF 22 (positive strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (5584)..(6543) <223> OTHER INFORMATION: ORF 23 (positive strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (6594)..(7604) <223> OTHER INFORMATION: ORF 24 (positive strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (7604)..(8653) <223> OTHER INFORMATION: ORF 25 (positive strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (8679)..(9434) <223> OTHER INFORMATION: ORF 26 (negative strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (9789)..(10715) <223> OTHER INFORMATION: ORF 27 (positive strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (10916)..(11980) <223> OTHER INFORMATION: ORF 28 (positive strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (11983)..(12969) <223> OTHER INFORMATION: ORF 29 (positive strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (13027)..(14052) <223> OTHER INFORMATION: ORF 30 (positive strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (13027)..(14052) <223> OTHER INFORMATION: ORF 30 (positive strandedness) <400> SEQUENCE: 22 ccggtccacc cagtgccgga gcgactcggc cggctggtag tcgaacggct cctcgtcgcc 60 gaactgcggc tcgttcgccc gcatctggat gcaggagcgc aggacctgct gtttcagccc 120 gatgtactcc gcgctgtgcg ggcccaactg ccattccacc tcggccggcc ggtactcgaa 180 gtagatgacc cggcgctgct tgccgaccac cgcgggcgcc ccgtgcagcg tgagaatgtt 240 gtgcaacagg aggtcccccg gctgcatcac ggctggcacc gccccggtgg tgtcccattc 300 gctcgcgttg agctggtccg cggtggcggt caaccggtcg tcgccccagt agttcgactc 360 cgggatgcac cagacgcagt tgtcctcggg ggcggggtcg aggtagatcc cggcgtcgat 420 gacccggccc gcaccggtga cgccgaccgc gttgtcgtag agcccggcgt cgcggtgcca 480 ggccagcctg ggcgctccgg cgggtgtctt gaagaccatg ctgtcccagg tcgggatcag 540 attcggtccg accaactgct ccatgatccg cagcagcaac ggatgaccgg ccagcatggc 600 gatgggtcgg gccttgtcga cgacgtactc gatccgcacc ggcgcggcgc ccggctggtc 660 gggctccaac gtccagaccg tgtcctccat ggaccgggtc gaccacgccc ggtcgatcag 720 ggcccggccg gcctcctgga cgtcggccag ttcctgcggt gtgagcaacc cgcggacgac 780 caggacaccc tgccggcgaa aggccgtgac gtgctcgggc agcagccccg tccgccggat 840 gtcgcattcc gcgatccggt tgacgacacg caggtctccg acggaactca tcagtccaca 900 cccttctgat cagcggtcgc gggcccggcg gcgcggtccg cgacgaagta ccagtcctgc 960 tccagggccg tggtgagcgc ggcgcggctc agcgcccgcc cgcccgcgag ccacgccggc 1020 agggtgaaca cctggtaacc cagctcctcc accagcagtg accacaggtc atccgtcgtc 1080 gtgccgtact cccgcatgac gttgtcgccg ccatgctcga agacgatcac cggccgacca 1140 cgccgcaacg tgtcgcgcgc cccccgcagg gccagcacct cgcccccctc gatgtcgatc 1200 ttgatcaggt cgacacgtac gtccgcggga atgacgtcgt ccaggcgcac cgtgtccacg 1260 gtgatctcgt gcagcgtctc cgccggccgg tcgtagggac gtcggcgcag gccgctgtag 1320 ccggggttgg acaccacgtg cacgaagctg tgccggcccg cggcgtcggc ggctgccgcc 1380 ctgacgaccg tgaccgacgg cagccggtct gccaactcgt cggcgagtgc gggcaacggt 1440 tcgacggcga agtggttgcc ctccggcgcc acccggacca ggtgctgggt gatctcgccg 1500 acgccggcgc cgaggtccac cgacacggcc gtccggccgc agacccgctc gatgatctcg 1560 acggtgagtc ggtcgtagtc ctcgttgcgc tgggcggggt cggccgccag ggagccgctc 1620 acgccggccg ggtccgtccg atgcccagtc gcgggctcgt catcaggtac aggccggtct 1680 ccgggtcgaa cagctcgagc ccgtcgatct tgttcagccc ctcgctgacc ggcgctttgg 1740 ggacaccggc ctccgccatc cggcgggcct gctcggcggc gaggaagagc tggccgaccg 1800 agttgccggc gtcggccgcc ggtagctgcg gcggaaccga gcgaccggcc atggcgtcgc 1860 tcacgtcggc gagcccggcg atcagctggg cgaacgcctt ggcgccgtcg acccgttcga 1920 actcggcctc gtcgtgctcg ccgatcagcc ggtcggcgag ggcgaagtag ttcgccttgc 1980 cggcctgctg ctggtagacg gccgcgacca gggtgaacag gcgctcgaag gcgttgcggt 2040 agagcgtctc gtagaacccg taggcctgct cctcctcgac gtcgccgttg acgatgccca 2100 ggatcgacgc cgacgcgagc atgccgctgt agagcgcgag gtgcacgccg gtcgacagca 2160 gcgggtccag gaagcaggcg ctgtcgcccg cggcgaagta gccggggccg cagaagctgt 2220 cggacacgta cgagaagtcc tgctcgaccc ggacacccgg ctggtacgtc ccggtcgcca 2280 ccaggctccg caccgtcggc gactcctcga cgagcgcggc gagcatgtcc tcgagtgagc 2340 cgtgttcgct gcggcgttcg aggaagcgct tctggtgaca cacgaacccg acgctgtagc 2400 ggttgccccg cagcgggatg acccagtacc agccgtccgg cgcgccgatc acgttgatgc 2460 caccctgcgg cgagttgggc agcagtgatc cgccgtccca gtagccccag atggcgacgt 2520 tcttgaacgt gtcgttcgcc cgccggtgct tgaagtggcg ggcggggatc atgccggcac 2580 ggccggaggc gtccacgacg aagtcgaact cggtggtgcg ccgctcgccg ctgtccggct 2640 cggccccact ccgcggccac cgcggcgggt cgccgtcgaa gatcacccgg ttgacctcgg 2700 cgttctggat aaccgtcgcg ccctgtttgg cggcgttgtt cagcagcacg tggtcgaagt 2760 cgtcgcggtc cacctgccag gacctgactc cgggaccgaa gatctcggtc cagtcgatgg 2820 cccagtcctc cttgccccac cgcagcagca caccgttctt ctgggtgtag ccgcgggcgt 2880 cgacgtcgct cagcgcgccg acgaagtcga cgatggtccg gcacgaggac gcgatcgact 2940 cgccgatgtg gtagcgcggg aaggtctcct tctccagcaa ggtcaccgac agtcccgcac 3000 gcgcgagcag tgccgcggcg gtcgatccgg ccggaccgcc accgataacc aagaccgtgc 3060 tgaccatgag gctcccaatc gtgaggagga cgggacgtga tccttctatt gagaacatca 3120 ccgtacggcg tgtccagatt ggcgttctac gatcactggg aaggtctagt gggagcgcta 3180 gtgtcatgcg cccgaagtga tctacgatgg ggctggttga ccgtctggcg tcaacctgat 3240 cccagcatgt tcggcccggg aacgggttct cgccgaattg ctgggcggaa ccctcgaatt 3300 ggtcggctgt cggctgcggg gggcttggtg tgcgccgcgc cgggcacttg tcgtccagac 3360 attaatgcgc atggagggtt cgtgaagata ctctttctgc cggggccggt gaaatcgaac 3420 gtattcgggg tgggggccct ggccgtcgcc gcacgggtga gcggccacga ggtcatcgtc 3480 gcgtccaccg tggagggcgc cgccgcggcg acgggcatcg gcctgcccgc cgtgacgacg 3540 agtgagctga cgctgaccca gcttctgacc accgatcgcg ccgggaacgc gctggagttt 3600 cctaccgacc ccgccgagtt gccgaccttc gtcggccaca tgttcggtcg tctcgccgcc 3660 gtcaacctcg gcccgacgcg tgacctcgtc accggctggc ggccggacgt cctggtgagc 3720 gggccgcacg cctacgccgg cccgctgctg gccgccgagt tcggcctgcc gtgcgcgcgg 3780 cacctgctca ccgggacccc gatcgaccgg gacggcacgc accccggcgt cgaggacgag 3840 ctcgagcccg agctgagcgc gctcggcctc gaccgggtgc ccgacttcga cctggcgatc 3900 gacatcttcc cggccagcat ccggcccgcg ggcggaccgg tgcagccgat gcggtggacg 3960 cccaccagcg agcagcggcc cgtggaaccg tggatggtca cgccggggga ccggcgccgg 4020 gtgctgctga ccgccggcag cctggtcacg ccgacgcacg gcatggacct gttgtggaac 4080 ctcgtgaccg cgctcgcgga cctggacgtc gaactggtcg tcgccgcccc ggaggaggtc 4140 ggcgcgctgg tccggaagat gcccggggtg gcgcacgcgg gctgggttcc gctggacatg 4200 gtcctgccca cctgcgccct gatcgtgcat cactccggca cgatgaccgc gctcaccgcc 4260 atgcaggccg gtgtcccgca gctgatcatc ccgcaggaga gccggttcgt ggactgggcc 4320 gggatgctgg cgaccaaggg catcgcgatc agcctgccgc ccggtgcgga caccgaggac 4380 gccctcgcgg gtgcggcccg ccggctgctg accgagccgg cctacgccac ggccgcgcgt 4440 gccctggccg acgagatcgc cgagatgccc ctgccggtca ccgtcgtcga cgtgctgcgg 4500 gacctgaccg agaaggcgcg gtgatctctg gggatttctt ggaccgtccc gccctacagt 4560 cggtgccgaa tcccgtccgc tctggcgaaa ggggagttca tgtgacgacc gagccggatc 4620 gatctcgata cctctaccga cagatgcgtc tcatccggga gttcgaggag cactgcctcg 4680 aaatggccgt cgccgggacg atcgtcggtg gtatccaccc ctacatcggt caggaggccg 4740 tcgcggtggg cgtgagcgcc cacctgcgag aggacgacgt catcaccagc acccaccgtg 4800 ggcacggcca cgtgctcgcg aagggcgccg atccgaagcg gaccctggcc gagctgtacg 4860 gcgcgagcac gggcctcaac cgggggcgtg gtgggtcgat gcacgccgcc gacgtggggc 4920 tgggcgtcta cggcgcgaac gggatcgtgg gcgcgggcgc acccatcgcg gtgggcgcgg 4980 cctgggcagc ccgacgccag ggccgtgacc agcaggtggc cgtggcgtac ttcggcgatg 5040 gcgcactcag ccagggcgtg gtgctcgagg ccttcaacct ggcggcgttg tggtcgctgc 5100 cggtgctgtt cgtctgcgag aacaacgggt acgccatcag cctgccggtc gaccggggcc 5160 tggcgggcga cccggtgcgt cgggcggccg ggttcggcct gaccgccgaa gcggtggacg 5220 ggatggacgt ggaggcggtc accgaggccg cggggcgggc ggtggccgcc tgccgtgccg 5280 gtgggggacc gcacttcctc gagtgcgtca cctaccggtt ccgtggtcac cacaccgtgg 5340 aacacctgat gggcatcaac taccgcgacg aggccgaggt ggccagctgg acggaacgtg 5400 acccgctggc gcgccagcgg gcgcgtctcg cgccggcggt cgccgacgag gtcgacgcgg 5460 agatcgccgc gctgatcgcc gaagccgtcg cgttcgccgg atcgagtccc gggtccgacc 5520 cgcgcgacgc tctggactac ctgtacgccg gcacggcgcc gacgcggccg ggagcgtgat 5580 ccgatgccga gtctgtccta catcgcagcg ttgaaccagg ccctgcgcga cgagatggcc 5640 cgtgacgaac gggtgtgcat cttcggcgag gacgtctgcc tgggcctcac cggcatcacc 5700 aaggggctgg ccgaggcgca cgatggccgg gtggtggaca cgccgctgtc cgagcaggcg 5760 ttcaccagcc tggccaccgg ggccgccatc gccggccagc gtcccgtcgt cgagttccag 5820 atcccgtccc tgctgtacct ggtgttcgag cagatcgcca accaggcgca caagttctcg 5880 ctgatgaccg gcggccaggc cagcgtcccg gtcacctatc tggtacccgg ctccgggtcc 5940 cggtcgggca tggccgggca gcactccgac cacccgtaca gcctgctcgc gcacgtgggg 6000 gtcaagaccg cggtgccggc gacgcccagc gacgcgtacg gcctgctgct gtcggcgatc 6060 cgggagccgg atccggtcgc cgtgttcgcg ccgaccctgc tgatgggcac gtccgaggag 6120 atcgacggtg acctcgacgc cgtgccgctg ggcagtgccc gtacgcaccg ggagggcacc 6180 gatgtcacgg tggtcgccgt gggccatctg gtcccggtcg ccctccaggt ggccgccgac 6240 ctggccggcg aggcgtcggt cgaggtcatc gacccgcgca cggtctaccc ggtcgactgg 6300 gagaccctgg gcaagtcgat cagccggacc ggtcggctgg tggtgatcga cgactcgaac 6360 cggatgtgtg gtttcggcgc cgagatcgcg gcgaccgcgg cggaggagtt cggcttggcg 6420 gtaccgccga agcgggtgtc ccggcccgac ggcgcagtga tcccgtacgc cctgaacctg 6480 gaccacgcgc tgctgcccga cgccctcgaa ctcaccaagg ccatccgggc cgtgctgcgt 6540 cggtagctgc tgtgggggta tcggacgcgg tgttgaagga gagaggccgg cacatgacat 6600 cgggacgccc gcgggtggcg accgtcacgg tgaccaccaa cgagagcaag tggctgcgtc 6660 gctgcctggg ggcgcttgtc gacagtgaca ccgaaggatt cgatcttgac gtgcacctga 6720 tcgacaacgc ctccaccgac ggcagcgcgg agctggtcgc gcgggagttc ccgagcgtga 6780 agatcacccg taatcccacc aacctcgggt tcgccggcgc caacaacgtc ggcatccggg 6840 ccgcgctcgc cgccggcgcc gactacgtgt tcctggtcaa cccggacacc tggaccccgc 6900 cacggctcgt ccgggcgatg gtcgaattcg ccgagcgttg gccggagtac ggcatcgtcg 6960 gcccgctgca ataccgctac gacgccgagt cgaccgagct cgtcgagttc aacgactgga 7020 ccaacacggc actctggctg ggcgaacagc acgcgttcgc gggcgacggg atggctcatc 7080 cctccccggc cggcagcccg caaggccgcg cgccgaggac cctggagcac gcgtacgtcc 7140 agggcgcggc gctgttcgcg cgggtggcga tgctgcgcga ggtgggcgtg ttcgatgagg 7200 tgttccacac gtactacgag gaggtggacc tgtgccggcg ggccagatgg gcgggctggc 7260 gggtggccct cctgctcgac gagggcctgc aacaccacgg cggcggcggt gcggccacgc 7320 gcagcgcgta cacccgggtg cacatgcggc gcaaccgtta ctactacctg ctcacggacg 7380 tggactggca cccgaccaag gcgacccggc tggccgcccg gtggctggtg gcggacctgg 7440 tcggccggac cgtggtcggc agggtggacc cgatgaccgg ggcccgggaa accctggcgg 7500 cggtgcgctg gctggcgggc cacgcgccga ccatagcgga acgtcgacgc agtcaccggg 7560 cgttgcgcgc gggccgtacg ccggcacggc gtgaggtggc gtcgtgaccg ggccccgcat 7620 cctcatctcc ggcaacttcc actggcaggc cgggttcagt cacacggtgg agggctacgt 7680 ccgggccgcc ggcgcggcgg gctgcgaggt ccgggtcagc ggcccgctgt cgcggatgga 7740 cgaccaggtg cccgggctcc tgcccgtcga gccggacctc ggttggggca cccacctggt 7800 ggtgatgttc gaggcccggc agttcctgac gcccgagcag atcgaactgg cgacccgcac 7860 gttcccccgg tcgcgccgcc tggtcgtgga cttcgacctg cactgggccg acgagcatcc 7920 ggaactgggc gtggacggca cggcgggcaa gtacaccgcc gagagctggc gctcgctcta 7980 cagcgagctg agcgacgtga tgctacagcc gaagctcacc gggaagatgg ccccgggagc 8040 ggagttcttc tcgtgcatcg gcatgcccga gaccgtgtgc cacccgttga ctctcggccg 8100 gcagcgggac tacgacctgc agtacatcgg cagcaactgg tggcgttggg agccgctgac 8160 ggccctggtg gaggcggcgg tgacgctgcg tcccgtgccg cgcatgcggg tctgcggccg 8220 tttctgggac ggcgccacct ctcccgggtt cgaggacgcg accacaagcg tcccgggctg 8280 gctggcggaa cgcggcgtcg agctctgccc gccggtggcc ttcgggcagg tgatcccgga 8340 gatgggccgg tcgctgatct caccggtcct ggtccgtccc ctggtggcgg gcacgggcct 8400 gctgacgccg cgcatgttcg agaccctggc gtcgggcgcc ctgccggctc tctccgccga 8460 cgcggagttc ctcgccgagg tctacggcga cgagtgcgcg cccctgctgc tcggcgacga 8520 tccggccacg acgctcgccc gcctcaccac ggacttcgag cggcatgccc ggatcgtcgg 8580 tcggatccag gaccgggtgc gggaggagta cggctacccc cgcgtcctgc ggaacctgct 8640 ggccttcttc gggtaggggg gcgtggtcgg gccggctatc cccagtccat ccacgggcgg 8700 ggctcggggt cggcgacctc ggccggcgcg ctcatgaaca ccagcacgta cgcgcgccga 8760 ggctggtcgg tcaggttcgg gccggcgtag tgcggggttc ggaagtcgtg caccaccgcg 8820 cccccgggcg ccagcgggca ggcgaccgcg ctggtcgggt cgacgtcgtc ggtcatcagg 8880 ccacggatgc ggtcgtcgtt gtcgatgtgg tggtgcggga gcaccggacc gcggtggccc 8940 cccggcaggt agtgcaggca gccgctctcg acggtggcct cgtccagggt cgtccagatg 9000 ctcaacccgc gccgcctcca ccggggatcc atgtaggcct cgtcctggtg ccacggcgtc 9060 ggagcgccgt atcgcggcgg cttcaggatg gcgtgcccgt agaactcgag ctcttcctcg 9120 gccatatcga gaaaggctga cgcaattgac cggcaccgcg cgaagtgcgg gctatccagc 9180 aactccggta cgtatttctc cggcttgatg atctgcggca gcagtggcgg cccttcgcgg 9240 tcgcgttggc cggcgatatc gtagaagtcc tccgcgcccg gggtcgcgcg ccggacgaaa 9300 agccggtcgt aggcctggcg cagccacgcc acctccgact cgctcgcgac ctgcgggagt 9360 atcgcgaacc cacgactgcg gaactcctcc cggtcacggt ggtctatggt gcccaccact 9420 tccatcgcgt ccatgccgtc tccttcaagg gatgacctcg acagtcacga tatgggtgcg 9480 gcacccgaca gtcatcaccc caggtcagga ttagggaacg gcctagaatc tgcggacaag 9540 tcgaatgtcg ccccccgttg tgtcagactc gccgtgtccc ttttcgagcg gaagcagcca 9600 ttcatgaccc gacaccacgc cgtcctcccg ggcggcggca ccacgcgcgc cctcctcgcg 9660 cgggcgcggc ccaccgtgcg gacggccccc ggcggcggcg cgctccggca cgtgacgtca 9720 cgcggtcgac gtgctgtcac cggcgttcga gtggtgttcc cgctgccggc cgagcgccag 9780 ggctgaccgt gccgacggcg atcgtggtgg gtgccgaggg ccaggacggg gtgttgttga 9840 gccggctgtt gcgggcccac gactaccggg tggtgccggt gggccggcac ggcccggtcg 9900 acatcgtccg gcccgacgac gtggccgaac tggtgaccga gctgcgaccg gacgagatct 9960 acctgctggc agcggtgcag aactccgcgc aggacccggt cgccgatccg gtggagctgg 10020 cgcaccggtc gtacgccgtc aacacgttgg ccgtggtgca cttcctggag gccgtcgagc 10080 ggcacagccc ggcgaccagg gtgttctacg ccgcctcctc acacgtcttc ggcaggccgg 10140 acacgccggt acaggacgag accacgccgc ttcgaccgac ctccgtctac ggcatcagca 10200 aggcggccgg tctgctgcac tgtcgttcct accgggcgcg gggggtgttc gcctcggtcg 10260 gcatcctcta cagccacgag tccccgctcc gccgccccgg cttcgtgtcc cgcaagatcg 10320 tggacgccgt ggtccgcatc cagcgcggcg aagcgttccg gctcgtgctc ggcggcctgg 10380 cggccgaggt ggactggggc tacgcgccgg actacgtgga tgcgatgagg cggattctcg 10440 gcctggcgac agcggacgac tacgtggtcg cctcgggggt gcggcgcacc gtccgcgagt 10500 tcgcggagac cgccttcgcg gcggtcgggc tggactggcg cgaccacgtc gaggagaacg 10560 ccgcggtgct cacccggccg agcgtgccgc tggtcggcga cgcgagccgg ttgcaggccg 10620 cgaccggctg gcgcccgagc gtcgacttcg ccggcatggt gcgggccctg ctgcgggcgg 10680 cgggtgccga cctggtcggg acgggccagg acggatagcc gacctgtccg tgcgcgctgc 10740 ttgttcagcc tggtcggctg gtccgactcc cggcgtcgcc gtcgatcgat aacggaccct 10800 ttagtaggga aatcacggga cagacttcgg taccgtcgaa gaaccagtcg cctccactgc 10860 cggagtccat cgtgaaccac gttcctgtcc cggtccgaac atccaggatc gactcgtgaa 10920 agcgctggta ttggcgggtg gaatcggctc gcgaatgcgc ccgatcaccc acacgtcagc 10980 gaagcagctc attccggtcg cgaacaaacc ggtcctcttc tacggcctgg aagcaattcg 11040 tgacgccggg atccgggaag ttggcatcat cgtcggcagc accgcgccgg agatcgagcg 11100 ggcggtcggt gacggctcgc agttcggctt gaaggtgacc tacctgccgc aggacgcccc 11160 gcgcggtctg gggcacgcgg tcctgatcgc ccgggacttc ctcggcgacg acgacttcgt 11220 gatgtacctg ggcgacaact tcgtcctcgg tggcatcaac gacgcggtcg agcggttccg 11280 ccgggaacgc ccgcacgccc agctgatgct gaccaaggtc aaggatccgc acgccttcgg 11340 catcgcgacg atgggcccgg acggccgggt cgtcgatgtc gaggagaagc cccggtatcc 11400 caagagcgac ctcgctctgg tgggcgtgta cgtcttcagc ccggtcgtgc acgaggcgat 11460 agccgaactg aagccgtcgt ggcgcaacga actggagatc accgacgcca tccagtggct 11520 gatcgaccac gacaggcgta tcgaatccac cataatcacc ggattctgga aggacaccgg 11580 cagcctcgcg gacatgctgg agatgaaccg gttcatcctg gaaagcctcg actccgaggt 11640 gagtggcgag gtcagtgcgg acaccgagat caccggtcgg gtcgtgatcg ggcccggggc 11700 ggtcatcacc gggtcgcgga tcatcgggcc cgtcgtggtc ggggccggct cgatcattcg 11760 caactcgcag ctcggcccgt tcacgtcgat cgactgcgac tgcaccgtca tcgacagcga 11820 gatcgagcag tccatcgtgc tccgcggcgc cttcatcgac ggcatcggcc ggatcgagtg 11880 gtcgatgatc ggccgtgagg cgcgcctgac cccgggcccg cgcgcgccga agacgtaccg 11940 cttcgtcctc ggcgaccaca gtgaagtacg ggtaggcgtg tagtgccgag ggtcttcgtg 12000 gccggtggcg ccggcttcat cggctcgcac tacgtgcggg aactcgtcgc cggggcgtac 12060 gccgggtggc agggctgcga ggtcacggtg ctcgacagcc tcacctatgc gggaaacctc 12120 gcgaatctcg ccggggtgcg ggacgccgtc accttcgtcc gcggtgacat ctgcgacggc 12180 cgactgctcg ccgaggtcct gcccggccac gacgtggtgc tgaacttcgc ggccgagacc 12240 cacgtcgacc ggtccatcgc cgactcggcg gagttcctgc ggaccaacgt tcagggcgtc 12300 cagtcgctca tgcaggcgtg cctgaccgcc ggagtgccga ccatcgtcca ggtctccacc 12360 gacgaggtgt acggcagcat cgaggccgga tcctggagcg aggacgcgcc gctggcgccg 12420 aactcgccgt acgccgcggc caaggcgggc ggtgacctga tcgccctggc gtacgcgcgg 12480 acgtacggac tgccggtccg catcaccagg tgcggcaaca actacggtcc ataccagttc 12540 ccggagaagg tgatccccct cttcctcacc cgtctgatgg acggtcggtc ggtcccgctc 12600 tacggcgacg ggcgcaacgt ccgcgactgg atccacgtgg ccgaccactg ccgtggcatc 12660 cagacggtgg tcgaacgcgg tgcgtccggc gaggtctacc acatcgccgg gacggccgag 12720 ctgaccaacc tggaactcac ccagcacctg ctggacgcgg tcggcggaag ctgggacgcc 12780 gtcgagaggg tgcccgaccg taagggccac gaccgccgct actcgctttc cgacgcgaag 12840 ctccgggccc tgggctacgc cccgcgcgtc cccttcgccg acggcctggc cgagacggtc 12900 gcgtggtacc gcgcgaaccg gcactggtgg gagccgctgc ggaaacaact cgacgccgtc 12960 ccgcacgact gacggtgcgg caccgcgatt gtccatgttc tcagccaacc ttcgaaggag 13020 cccggtatgg ctcactgcct ggtcacgggt ggcgccggtt tcatcggttc gcacctggcg 13080 ggacggttga ccagtgacgg gcaccgggtc accgtgctcg acgatctcag cggcggcagc 13140 gcctcccgcg tgcccgcggg cgccgatctg atcgtcggct cggtgaccga cgccgacctg 13200 gtggaacggg ccttcgccga gcaccgcttc gaccgggtct tccacttcgc ggccttcgca 13260 gccgaagcga tcagccactc ggtcaagaag ctcaactacg gcaccaacgt gatgggcagc 13320 atcaacctca tcaacgcgtc gttgcagacc ggggtgtcgt tcttctgctt cgcctcctcg 13380 gtcgccgtct acggtcacgg tgaaacgccg atgcgagaaa cctccatccc ggtgccggcg 13440 gacagctacg gcaacgccaa gctcgtcatc gagcgcgaac tcgaggtgac ggcgcggacg 13500 cagggccttc cgttcaccgc cttccgcatg cacaacgtct acggcgagtg gcagaacatg 13560 cgcgacccgt accggaacgc ggtcgcgatc ttcttcaacc agatcctgcg tggcgagccg 13620 atcacggtct acggcgacgg cggtcaggtg cgggcgttca cgtacgtggg cgacgtcgtg 13680 gacgtggtgt gccaggcgcc cgacgtcgag gaggcctggg gccggagctt caacgtgggc 13740 gcggccagca ccaacaccgt gctggagctc gcggaggcgg tccgggtggc ggccggcgtt 13800 ccggatcatc cgatcgtgca cctgcccgcg cgcgacgagg tccgggtggc gtacaccgcg 13860 accgacagcg cccggaaggt cttcggcgac tgggcggaca ccccgctggc ggacggactg 13920 gcccggaccg ccacgtgggc ggccggtgtg ggaccgacgg aactgcgatc gtcgttcgac 13980 atcgagatcg gcggccatca ggttccggag tgggcgcggc ttgtcgaaaa gcgcctggga 14040 tcggcgcctc gctgacagtg gtgaaaacac cagtttcccg cgcgcacccg aacactaggc 14100 ttggaatcca tggaccgtag ggagattcag cgtcgcgcga aggaactcgt agccgtgggt 14160 gaacggattc gagttcgagg gaattc 14186 <210> SEQ ID NO 23 <211> LENGTH: 296 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 23 Met Ser Ser Val Gly Asp Leu Arg Val Val Asn Arg Ile Ala Glu Cys 1 5 10 15 Asp Ile Arg Arg Thr Gly Leu Leu Pro Glu His Val Thr Ala Phe Arg 20 25 30 Arg Gln Gly Val Leu Val Val Arg Gly Leu Leu Thr Pro Gln Glu Leu 35 40 45 Ala Asp Val Gln Glu Ala Gly Arg Ala Leu Ile Asp Arg Ala Trp Ser 50 55 60 Thr Arg Ser Met Glu Asp Thr Val Trp Thr Leu Glu Pro Asp Gln Pro 65 70 75 80 Gly Ala Ala Pro Val Arg Ile Glu Tyr Val Val Asp Lys Ala Arg Pro 85 90 95 Ile Ala Met Leu Ala Gly His Pro Leu Leu Leu Arg Ile Met Glu Gln 100 105 110 Leu Val Gly Pro Asn Leu Ile Pro Thr Trp Asp Ser Met Val Phe Lys 115 120 125 Thr Pro Ala Gly Ala Pro Arg Leu Ala Trp His Arg Asp Ala Gly Leu 130 135 140 Tyr Asp Asn Ala Val Gly Val Thr Gly Ala Gly Arg Val Ile Asp Ala 145 150 155 160 Gly Ile Tyr Leu Asp Pro Ala Pro Glu Asp Asn Cys Val Trp Cys Ile 165 170 175 Pro Glu Ser Asn Tyr Trp Gly Asp Asp Arg Leu Thr Ala Thr Ala Asp 180 185 190 Gln Leu Asn Ala Ser Glu Trp Asp Thr Thr Gly Ala Val Pro Ala Val 195 200 205 Met Gln Pro Gly Asp Leu Leu Leu His Asn Ile Leu Thr Leu His Gly 210 215 220 Ala Pro Ala Val Val Gly Lys Gln Arg Arg Val Ile Tyr Phe Glu Tyr 225 230 235 240 Arg Pro Ala Glu Val Glu Trp Gln Leu Gly Pro His Ser Ala Glu Tyr 245 250 255 Ile Gly Leu Lys Gln Gln Val Leu Arg Ser Cys Ile Gln Met Arg Ala 260 265 270 Asn Glu Pro Gln Phe Gly Asp Glu Glu Pro Phe Asp Tyr Gln Pro Ala 275 280 285 Glu Ser Leu Arg His Trp Val Asp 290 295 <210> SEQ ID NO 24 <211> LENGTH: 243 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 24 Val Ser Gly Ser Leu Ala Ala Asp Pro Ala Gln Arg Asn Glu Asp Tyr 1 5 10 15 Asp Arg Leu Thr Val Glu Ile Ile Glu Arg Val Cys Gly Arg Thr Ala 20 25 30 Val Ser Val Asp Leu Gly Ala Gly Val Gly Glu Ile Thr Gln His Leu 35 40 45 Val Arg Val Ala Pro Glu Gly Asn His Phe Ala Val Glu Pro Leu Pro 50 55 60 Ala Leu Ala Asp Glu Leu Ala Asp Arg Leu Pro Ser Val Thr Val Val 65 70 75 80 Arg Ala Ala Ala Ala Asp Ala Ala Gly Arg His Ser Phe Val His Val 85 90 95 Val Ser Asn Pro Gly Tyr Ser Gly Leu Arg Arg Arg Pro Tyr Asp Arg 100 105 110 Pro Ala Glu Thr Leu His Glu Ile Thr Val Asp Thr Val Arg Leu Asp 115 120 125 Asp Val Ile Pro Ala Asp Val Arg Val Asp Leu Ile Lys Ile Asp Ile 130 135 140 Glu Gly Gly Glu Val Leu Ala Leu Arg Gly Ala Arg Asp Thr Leu Arg 145 150 155 160 Arg Gly Arg Pro Val Ile Val Phe Glu His Gly Gly Asp Asn Val Met 165 170 175 Arg Glu Tyr Gly Thr Thr Thr Asp Asp Leu Trp Ser Leu Leu Val Glu 180 185 190 Glu Leu Gly Tyr Gln Val Phe Thr Leu Pro Ala Trp Leu Ala Gly Gly 195 200 205 Arg Ala Leu Ser Arg Ala Ala Leu Thr Thr Ala Leu Glu Gln Asp Trp 210 215 220 Tyr Phe Val Ala Asp Arg Ala Ala Gly Pro Ala Thr Ala Asp Gln Lys 225 230 235 240 Gly Val Asp <210> SEQ ID NO 25 <211> LENGTH: 482 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 25 Met Val Ser Thr Val Leu Val Ile Gly Gly Gly Pro Ala Gly Ser Thr 1 5 10 15 Ala Ala Ala Leu Leu Ala Arg Ala Gly Leu Ser Val Thr Leu Leu Glu 20 25 30 Lys Glu Thr Phe Pro Arg Tyr His Ile Gly Glu Ser Ile Ala Ser Ser 35 40 45 Cys Arg Thr Ile Val Asp Phe Val Gly Ala Leu Ser Asp Val Asp Ala 50 55 60 Arg Gly Tyr Thr Gln Lys Asn Gly Val Leu Leu Arg Trp Gly Lys Glu 65 70 75 80 Asp Trp Ala Ile Asp Trp Thr Glu Ile Phe Gly Pro Gly Val Arg Ser 85 90 95 Trp Gln Val Asp Arg Asp Asp Phe Asp His Val Leu Leu Asn Asn Ala 100 105 110 Ala Lys Gln Gly Ala Thr Val Ile Gln Asn Ala Glu Val Asn Arg Val 115 120 125 Ile Phe Asp Gly Asp Pro Pro Arg Trp Pro Arg Ser Gly Ala Glu Pro 130 135 140 Asp Ser Gly Glu Arg Arg Thr Thr Glu Phe Asp Phe Val Val Asp Ala 145 150 155 160 Ser Gly Arg Ala Gly Met Ile Pro Ala Arg His Phe Lys His Arg Arg 165 170 175 Ala Asn Asp Thr Phe Lys Asn Val Ala Ile Trp Gly Tyr Trp Asp Gly 180 185 190 Gly Ser Leu Leu Pro Asn Ser Pro Gln Gly Gly Ile Asn Val Ile Gly 195 200 205 Ala Pro Asp Gly Trp Tyr Trp Val Ile Pro Leu Arg Gly Asn Arg Tyr 210 215 220 Ser Val Gly Phe Val Cys His Gln Lys Arg Phe Leu Glu Arg Arg Ser 225 230 235 240 Glu His Gly Ser Leu Glu Asp Met Leu Ala Ala Leu Val Glu Glu Ser 245 250 255 Pro Thr Val Arg Ser Leu Val Ala Thr Gly Thr Tyr Gln Pro Gly Val 260 265 270 Arg Val Glu Gln Asp Phe Ser Tyr Val Ser Asp Ser Phe Cys Gly Pro 275 280 285 Gly Tyr Phe Ala Ala Gly Asp Ser Ala Cys Phe Leu Asp Pro Leu Leu 290 295 300 Ser Thr Gly Val His Leu Ala Leu Tyr Ser Gly Met Leu Ala Ser Ala 305 310 315 320 Ser Ile Leu Gly Ile Val Asn Gly Asp Val Glu Glu Glu Gln Ala Tyr 325 330 335 Gly Phe Tyr Glu Thr Leu Tyr Arg Asn Ala Phe Glu Arg Leu Phe Thr 340 345 350 Leu Val Ala Ala Val Tyr Gln Gln Gln Ala Gly Lys Ala Asn Tyr Phe 355 360 365 Ala Leu Ala Asp Arg Leu Ile Gly Glu His Asp Glu Ala Glu Phe Glu 370 375 380 Arg Val Asp Gly Ala Lys Ala Phe Ala Gln Leu Ile Ala Gly Leu Ala 385 390 395 400 Asp Val Ser Asp Ala Met Ala Gly Arg Ser Val Pro Pro Gln Leu Pro 405 410 415 Ala Ala Asp Ala Gly Asn Ser Val Gly Gln Leu Phe Leu Ala Ala Glu 420 425 430 Gln Ala Arg Arg Met Ala Glu Ala Gly Val Pro Lys Ala Pro Val Ser 435 440 445 Glu Gly Leu Asn Lys Ile Asp Gly Leu Glu Leu Phe Asp Pro Glu Thr 450 455 460 Gly Leu Tyr Leu Met Thr Ser Pro Arg Leu Gly Ile Gly Arg Thr Arg 465 470 475 480 Pro Ala <210> SEQ ID NO 26 <211> LENGTH: 380 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 26 Val Lys Ile Leu Phe Leu Pro Gly Pro Val Lys Ser Asn Val Phe Gly 1 5 10 15 Val Gly Ala Leu Ala Val Ala Ala Arg Val Ser Gly His Glu Val Ile 20 25 30 Val Ala Ser Thr Val Glu Gly Ala Ala Ala Ala Thr Gly Ile Gly Leu 35 40 45 Pro Ala Val Thr Thr Ser Glu Leu Thr Leu Thr Gln Leu Leu Thr Thr 50 55 60 Asp Arg Ala Gly Asn Ala Leu Glu Phe Pro Thr Asp Pro Ala Glu Leu 65 70 75 80 Pro Thr Phe Val Gly His Met Phe Gly Arg Leu Ala Ala Val Asn Leu 85 90 95 Gly Pro Thr Arg Asp Leu Val Thr Gly Trp Arg Pro Asp Val Leu Val 100 105 110 Ser Gly Pro His Ala Tyr Ala Gly Pro Leu Leu Ala Ala Glu Phe Gly 115 120 125 Leu Pro Cys Ala Arg His Leu Leu Thr Gly Thr Pro Ile Asp Arg Asp 130 135 140 Gly Thr His Pro Gly Val Glu Asp Glu Leu Glu Pro Glu Leu Ser Ala 145 150 155 160 Leu Gly Leu Asp Arg Val Pro Asp Phe Asp Leu Ala Ile Asp Ile Phe 165 170 175 Pro Ala Ser Ile Arg Pro Ala Gly Gly Pro Val Gln Pro Met Arg Trp 180 185 190 Thr Pro Thr Ser Glu Gln Arg Pro Val Glu Pro Trp Met Val Thr Pro 195 200 205 Gly Asp Arg Arg Arg Val Leu Leu Thr Ala Gly Ser Leu Val Thr Pro 210 215 220 Thr His Gly Met Asp Leu Leu Trp Asn Leu Val Thr Ala Leu Ala Asp 225 230 235 240 Leu Asp Val Glu Leu Val Val Ala Ala Pro Glu Glu Val Gly Ala Leu 245 250 255 Val Arg Lys Met Pro Gly Val Ala His Ala Gly Trp Val Pro Leu Asp 260 265 270 Met Val Leu Pro Thr Cys Ala Leu Ile Val His His Ser Gly Thr Met 275 280 285 Thr Ala Leu Thr Ala Met Gln Ala Gly Val Pro Gln Leu Ile Ile Pro 290 295 300 Gln Glu Ser Arg Phe Val Asp Trp Ala Gly Met Leu Ala Thr Lys Gly 305 310 315 320 Ile Ala Ile Ser Leu Pro Pro Gly Ala Asp Thr Glu Asp Ala Leu Ala 325 330 335 Gly Ala Ala Arg Arg Leu Leu Thr Glu Pro Ala Tyr Ala Thr Ala Ala 340 345 350 Arg Ala Leu Ala Asp Glu Ile Ala Glu Met Pro Leu Pro Val Thr Val 355 360 365 Val Asp Val Leu Arg Asp Leu Thr Glu Lys Ala Arg 370 375 380 <210> SEQ ID NO 27 <211> LENGTH: 325 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 27 Val Thr Thr Glu Pro Asp Arg Ser Arg Tyr Leu Tyr Arg Gln Met Arg 1 5 10 15 Leu Ile Arg Glu Phe Glu Glu His Cys Leu Glu Met Ala Val Ala Gly 20 25 30 Thr Ile Val Gly Gly Ile His Pro Tyr Ile Gly Gln Glu Ala Val Ala 35 40 45 Val Gly Val Ser Ala His Leu Arg Glu Asp Asp Val Ile Thr Ser Thr 50 55 60 His Arg Gly His Gly His Val Leu Ala Lys Gly Ala Asp Pro Lys Arg 65 70 75 80 Thr Leu Ala Glu Leu Tyr Gly Ala Ser Thr Gly Leu Asn Arg Gly Arg 85 90 95 Gly Gly Ser Met His Ala Ala Asp Val Gly Leu Gly Val Tyr Gly Ala 100 105 110 Asn Gly Ile Val Gly Ala Gly Ala Pro Ile Ala Val Gly Ala Ala Trp 115 120 125 Ala Ala Arg Arg Gln Gly Arg Asp Gln Gln Val Ala Val Ala Tyr Phe 130 135 140 Gly Asp Gly Ala Leu Ser Gln Gly Val Val Leu Glu Ala Phe Asn Leu 145 150 155 160 Ala Ala Leu Trp Ser Leu Pro Val Leu Phe Val Cys Glu Asn Asn Gly 165 170 175 Tyr Ala Ile Ser Leu Pro Val Asp Arg Gly Leu Ala Gly Asp Pro Val 180 185 190 Arg Arg Ala Ala Gly Phe Gly Leu Thr Ala Glu Ala Val Asp Gly Met 195 200 205 Asp Val Glu Ala Val Thr Glu Ala Ala Gly Arg Ala Val Ala Ala Cys 210 215 220 Arg Ala Gly Gly Gly Pro His Phe Leu Glu Cys Val Thr Tyr Arg Phe 225 230 235 240 Arg Gly His His Thr Val Glu His Leu Met Gly Ile Asn Tyr Arg Asp 245 250 255 Glu Ala Glu Val Ala Ser Trp Thr Glu Arg Asp Pro Leu Ala Arg Gln 260 265 270 Arg Ala Arg Leu Ala Pro Ala Val Ala Asp Glu Val Asp Ala Glu Ile 275 280 285 Ala Ala Leu Ile Ala Glu Ala Val Ala Phe Ala Gly Ser Ser Pro Gly 290 295 300 Ser Asp Pro Arg Asp Ala Leu Asp Tyr Leu Tyr Ala Gly Thr Ala Pro 305 310 315 320 Thr Arg Pro Gly Ala 325 <210> SEQ ID NO 28 <211> LENGTH: 320 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 28 Met Pro Ser Leu Ser Tyr Ile Ala Ala Leu Asn Gln Ala Leu Arg Asp 1 5 10 15 Glu Met Ala Arg Asp Glu Arg Val Cys Ile Phe Gly Glu Asp Val Cys 20 25 30 Leu Gly Leu Thr Gly Ile Thr Lys Gly Leu Ala Glu Ala His Asp Gly 35 40 45 Arg Val Val Asp Thr Pro Leu Ser Glu Gln Ala Phe Thr Ser Leu Ala 50 55 60 Thr Gly Ala Ala Ile Ala Gly Gln Arg Pro Val Val Glu Phe Gln Ile 65 70 75 80 Pro Ser Leu Leu Tyr Leu Val Phe Glu Gln Ile Ala Asn Gln Ala His 85 90 95 Lys Phe Ser Leu Met Thr Gly Gly Gln Ala Ser Val Pro Val Thr Tyr 100 105 110 Leu Val Pro Gly Ser Gly Ser Arg Ser Gly Met Ala Gly Gln His Ser 115 120 125 Asp His Pro Tyr Ser Leu Leu Ala His Val Gly Val Lys Thr Ala Val 130 135 140 Pro Ala Thr Pro Ser Asp Ala Tyr Gly Leu Leu Leu Ser Ala Ile Arg 145 150 155 160 Glu Pro Asp Pro Val Ala Val Phe Ala Pro Thr Leu Leu Met Gly Thr 165 170 175 Ser Glu Glu Ile Asp Gly Asp Leu Asp Ala Val Pro Leu Gly Ser Ala 180 185 190 Arg Thr His Arg Glu Gly Thr Asp Val Thr Val Val Ala Val Gly His 195 200 205 Leu Val Pro Val Ala Leu Gln Val Ala Ala Asp Leu Ala Gly Glu Ala 210 215 220 Ser Val Glu Val Ile Asp Pro Arg Thr Val Tyr Pro Val Asp Trp Glu 225 230 235 240 Thr Leu Gly Lys Ser Ile Ser Arg Thr Gly Arg Leu Val Val Ile Asp 245 250 255 Asp Ser Asn Arg Met Cys Gly Phe Gly Ala Glu Ile Ala Ala Thr Ala 260 265 270 Ala Glu Glu Phe Gly Leu Ala Val Pro Pro Lys Arg Val Ser Arg Pro 275 280 285 Asp Gly Ala Val Ile Pro Tyr Ala Leu Asn Leu Asp His Ala Leu Leu 290 295 300 Pro Asp Ala Leu Glu Leu Thr Lys Ala Ile Arg Ala Val Leu Arg Arg 305 310 315 320 <210> SEQ ID NO 29 <211> LENGTH: 337 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 29 Met Thr Ser Gly Arg Pro Arg Val Ala Thr Val Thr Val Thr Thr Asn 1 5 10 15 Glu Ser Lys Trp Leu Arg Arg Cys Leu Gly Ala Leu Val Asp Ser Asp 20 25 30 Thr Glu Gly Phe Asp Leu Asp Val His Leu Ile Asp Asn Ala Ser Thr 35 40 45 Asp Gly Ser Ala Glu Leu Val Ala Arg Glu Phe Pro Ser Val Lys Ile 50 55 60 Thr Arg Asn Pro Thr Asn Leu Gly Phe Ala Gly Ala Asn Asn Val Gly 65 70 75 80 Ile Arg Ala Ala Leu Ala Ala Gly Ala Asp Tyr Val Phe Leu Val Asn 85 90 95 Pro Asp Thr Trp Thr Pro Pro Arg Leu Val Arg Ala Met Val Glu Phe 100 105 110 Ala Glu Arg Trp Pro Glu Tyr Gly Ile Val Gly Pro Leu Gln Tyr Arg 115 120 125 Tyr Asp Ala Glu Ser Thr Glu Leu Val Glu Phe Asn Asp Trp Thr Asn 130 135 140 Thr Ala Leu Trp Leu Gly Glu Gln His Ala Phe Ala Gly Asp Gly Met 145 150 155 160 Ala His Pro Ser Pro Ala Gly Ser Pro Gln Gly Arg Ala Pro Arg Thr 165 170 175 Leu Glu His Ala Tyr Val Gln Gly Ala Ala Leu Phe Ala Arg Val Ala 180 185 190 Met Leu Arg Glu Val Gly Val Phe Asp Glu Val Phe His Thr Tyr Tyr 195 200 205 Glu Glu Val Asp Leu Cys Arg Arg Ala Arg Trp Ala Gly Trp Arg Val 210 215 220 Ala Leu Leu Leu Asp Glu Gly Leu Gln His His Gly Gly Gly Gly Ala 225 230 235 240 Ala Thr Arg Ser Ala Tyr Thr Arg Val His Met Arg Arg Asn Arg Tyr 245 250 255 Tyr Tyr Leu Leu Thr Asp Val Asp Trp His Pro Thr Lys Ala Thr Arg 260 265 270 Leu Ala Ala Arg Trp Leu Val Ala Asp Leu Val Gly Arg Thr Val Val 275 280 285 Gly Arg Val Asp Pro Met Thr Gly Ala Arg Glu Thr Leu Ala Ala Val 290 295 300 Arg Trp Leu Ala Gly His Ala Pro Thr Ile Ala Glu Arg Arg Arg Ser 305 310 315 320 His Arg Ala Leu Arg Ala Gly Arg Thr Pro Ala Arg Arg Glu Val Ala 325 330 335 Ser <210> SEQ ID NO 30 <211> LENGTH: 350 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 30 Val Thr Gly Pro Arg Ile Leu Ile Ser Gly Asn Phe His Trp Gln Ala 1 5 10 15 Gly Phe Ser His Thr Val Glu Gly Tyr Val Arg Ala Ala Gly Ala Ala 20 25 30 Gly Cys Glu Val Arg Val Ser Gly Pro Leu Ser Arg Met Asp Asp Gln 35 40 45 Val Pro Gly Leu Leu Pro Val Glu Pro Asp Leu Gly Trp Gly Thr His 50 55 60 Leu Val Val Met Phe Glu Ala Arg Gln Phe Leu Thr Pro Glu Gln Ile 65 70 75 80 Glu Leu Ala Thr Arg Thr Phe Pro Arg Ser Arg Arg Leu Val Val Asp 85 90 95 Phe Asp Leu His Trp Ala Asp Glu His Pro Glu Leu Gly Val Asp Gly 100 105 110 Thr Ala Gly Lys Tyr Thr Ala Glu Ser Trp Arg Ser Leu Tyr Ser Glu 115 120 125 Leu Ser Asp Val Met Leu Gln Pro Lys Leu Thr Gly Lys Met Ala Pro 130 135 140 Gly Ala Glu Phe Phe Ser Cys Ile Gly Met Pro Glu Thr Val Cys His 145 150 155 160 Pro Leu Thr Leu Gly Arg Gln Arg Asp Tyr Asp Leu Gln Tyr Ile Gly 165 170 175 Ser Asn Trp Trp Arg Trp Glu Pro Leu Thr Ala Leu Val Glu Ala Ala 180 185 190 Val Thr Leu Arg Pro Val Pro Arg Met Arg Val Cys Gly Arg Phe Trp 195 200 205 Asp Gly Ala Thr Ser Pro Gly Phe Glu Asp Ala Thr Thr Ser Val Pro 210 215 220 Gly Trp Leu Ala Glu Arg Gly Val Glu Leu Cys Pro Pro Val Ala Phe 225 230 235 240 Gly Gln Val Ile Pro Glu Met Gly Arg Ser Leu Ile Ser Pro Val Leu 245 250 255 Val Arg Pro Leu Val Ala Gly Thr Gly Leu Leu Thr Pro Arg Met Phe 260 265 270 Glu Thr Leu Ala Ser Gly Ala Leu Pro Ala Leu Ser Ala Asp Ala Glu 275 280 285 Phe Leu Ala Glu Val Tyr Gly Asp Glu Cys Ala Pro Leu Leu Leu Gly 290 295 300 Asp Asp Pro Ala Thr Thr Leu Ala Arg Leu Thr Thr Asp Phe Glu Arg 305 310 315 320 His Ala Arg Ile Val Gly Arg Ile Gln Asp Arg Val Arg Glu Glu Tyr 325 330 335 Gly Tyr Pro Arg Val Leu Arg Asn Leu Leu Ala Phe Phe Gly 340 345 350 <210> SEQ ID NO 31 <211> LENGTH: 252 <212> TYPE: PRT <213> ORGANISM: M/ carbonacea <400> SEQUENCE: 31 Met Asp Ala Met Glu Val Val Gly Thr Ile Asp His Arg Asp Arg Glu 1 5 10 15 Glu Phe Arg Ser Arg Gly Phe Ala Ile Leu Pro Gln Val Ala Ser Glu 20 25 30 Ser Glu Val Ala Trp Leu Arg Gln Ala Tyr Asp Arg Leu Phe Val Arg 35 40 45 Arg Ala Thr Pro Gly Ala Glu Asp Phe Tyr Asp Ile Ala Gly Gln Arg 50 55 60 Asp Arg Glu Gly Pro Pro Leu Leu Pro Gln Ile Ile Lys Pro Glu Lys 65 70 75 80 Tyr Val Pro Glu Leu Leu Asp Ser Pro His Phe Ala Arg Cys Arg Ser 85 90 95 Ile Ala Ser Ala Phe Leu Asp Met Ala Glu Glu Glu Leu Glu Phe Tyr 100 105 110 Gly His Ala Ile Leu Lys Pro Pro Arg Tyr Gly Ala Pro Thr Pro Trp 115 120 125 His Gln Asp Glu Ala Tyr Met Asp Pro Arg Trp Arg Arg Arg Gly Leu 130 135 140 Ser Ile Trp Thr Thr Leu Asp Glu Ala Thr Val Glu Ser Gly Cys Leu 145 150 155 160 His Tyr Leu Pro Gly Gly His Arg Gly Pro Val Leu Pro His His His 165 170 175 Ile Asp Asn Asp Asp Arg Ile Arg Gly Leu Met Thr Asp Asp Val Asp 180 185 190 Pro Thr Ser Ala Val Ala Cys Pro Leu Ala Pro Gly Gly Ala Val Val 195 200 205 His Asp Phe Arg Thr Pro His Tyr Ala Gly Pro Asn Leu Thr Asp Gln 210 215 220 Pro Arg Arg Ala Tyr Val Leu Val Phe Met Ser Ala Pro Ala Glu Val 225 230 235 240 Ala Asp Pro Glu Pro Arg Pro Trp Met Asp Trp Gly 245 250 <210> SEQ ID NO 32 <211> LENGTH: 309 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 32 Val Pro Thr Ala Ile Val Val Gly Ala Glu Gly Gln Asp Gly Val Leu 1 5 10 15 Leu Ser Arg Leu Leu Arg Ala His Asp Tyr Arg Val Val Pro Val Gly 20 25 30 Arg His Gly Pro Val Asp Ile Val Arg Pro Asp Asp Val Ala Glu Leu 35 40 45 Val Thr Glu Leu Arg Pro Asp Glu Ile Tyr Leu Leu Ala Ala Val Gln 50 55 60 Asn Ser Ala Gln Asp Pro Val Ala Asp Pro Val Glu Leu Ala His Arg 65 70 75 80 Ser Tyr Ala Val Asn Thr Leu Ala Val Val His Phe Leu Glu Ala Val 85 90 95 Glu Arg His Ser Pro Ala Thr Arg Val Phe Tyr Ala Ala Ser Ser His 100 105 110 Val Phe Gly Arg Pro Asp Thr Pro Val Gln Asp Glu Thr Thr Pro Leu 115 120 125 Arg Pro Thr Ser Val Tyr Gly Ile Ser Lys Ala Ala Gly Leu Leu His 130 135 140 Cys Arg Ser Tyr Arg Ala Arg Gly Val Phe Ala Ser Val Gly Ile Leu 145 150 155 160 Tyr Ser His Glu Ser Pro Leu Arg Arg Pro Gly Phe Val Ser Arg Lys 165 170 175 Ile Val Asp Ala Val Val Arg Ile Gln Arg Gly Glu Ala Phe Arg Leu 180 185 190 Val Leu Gly Gly Leu Ala Ala Glu Val Asp Trp Gly Tyr Ala Pro Asp 195 200 205 Tyr Val Asp Ala Met Arg Arg Ile Leu Gly Leu Ala Thr Ala Asp Asp 210 215 220 Tyr Val Val Ala Ser Gly Val Arg Arg Thr Val Arg Glu Phe Ala Glu 225 230 235 240 Thr Ala Phe Ala Ala Val Gly Leu Asp Trp Arg Asp His Val Glu Glu 245 250 255 Asn Ala Ala Val Leu Thr Arg Pro Ser Val Pro Leu Val Gly Asp Ala 260 265 270 Ser Arg Leu Gln Ala Ala Thr Gly Trp Arg Pro Ser Val Asp Phe Ala 275 280 285 Gly Met Val Arg Ala Leu Leu Arg Ala Ala Gly Ala Asp Leu Val Gly 290 295 300 Thr Gly Gln Asp Gly 305 <210> SEQ ID NO 33 <211> LENGTH: 355 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 33 Val Lys Ala Leu Val Leu Ala Gly Gly Ile Gly Ser Arg Met Arg Pro 1 5 10 15 Ile Thr His Thr Ser Ala Lys Gln Leu Ile Pro Val Ala Asn Lys Pro 20 25 30 Val Leu Phe Tyr Gly Leu Glu Ala Ile Arg Asp Ala Gly Ile Arg Glu 35 40 45 Val Gly Ile Ile Val Gly Ser Thr Ala Pro Glu Ile Glu Arg Ala Val 50 55 60 Gly Asp Gly Ser Gln Phe Gly Leu Lys Val Thr Tyr Leu Pro Gln Asp 65 70 75 80 Ala Pro Arg Gly Leu Gly His Ala Val Leu Ile Ala Arg Asp Phe Leu 85 90 95 Gly Asp Asp Asp Phe Val Met Tyr Leu Gly Asp Asn Phe Val Leu Gly 100 105 110 Gly Ile Asn Asp Ala Val Glu Arg Phe Arg Arg Glu Arg Pro His Ala 115 120 125 Gln Leu Met Leu Thr Lys Val Lys Asp Pro His Ala Phe Gly Ile Ala 130 135 140 Thr Met Gly Pro Asp Gly Arg Val Val Asp Val Glu Glu Lys Pro Arg 145 150 155 160 Tyr Pro Lys Ser Asp Leu Ala Leu Val Gly Val Tyr Val Phe Ser Pro 165 170 175 Val Val His Glu Ala Ile Ala Glu Leu Lys Pro Ser Trp Arg Asn Glu 180 185 190 Leu Glu Ile Thr Asp Ala Ile Gln Trp Leu Ile Asp His Asp Arg Arg 195 200 205 Ile Glu Ser Thr Ile Ile Thr Gly Phe Trp Lys Asp Thr Gly Ser Leu 210 215 220 Ala Asp Met Leu Glu Met Asn Arg Phe Ile Leu Glu Ser Leu Asp Ser 225 230 235 240 Glu Val Ser Gly Glu Val Ser Ala Asp Thr Glu Ile Thr Gly Arg Val 245 250 255 Val Ile Gly Pro Gly Ala Val Ile Thr Gly Ser Arg Ile Ile Gly Pro 260 265 270 Val Val Val Gly Ala Gly Ser Ile Ile Arg Asn Ser Gln Leu Gly Pro 275 280 285 Phe Thr Ser Ile Asp Cys Asp Cys Thr Val Ile Asp Ser Glu Ile Glu 290 295 300 Gln Ser Ile Val Leu Arg Gly Ala Phe Ile Asp Gly Ile Gly Arg Ile 305 310 315 320 Glu Trp Ser Met Ile Gly Arg Glu Ala Arg Leu Thr Pro Gly Pro Arg 325 330 335 Ala Pro Lys Thr Tyr Arg Phe Val Leu Gly Asp His Ser Glu Val Arg 340 345 350 Val Gly Val 355 <210> SEQ ID NO 34 <211> LENGTH: 329 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 34 Val Pro Arg Val Phe Val Ala Gly Gly Ala Gly Phe Ile Gly Ser His 1 5 10 15 Tyr Val Arg Glu Leu Val Ala Gly Ala Tyr Ala Gly Trp Gln Gly Cys 20 25 30 Glu Val Thr Val Leu Asp Ser Leu Thr Tyr Ala Gly Asn Leu Ala Asn 35 40 45 Leu Ala Gly Val Arg Asp Ala Val Thr Phe Val Arg Gly Asp Ile Cys 50 55 60 Asp Gly Arg Leu Leu Ala Glu Val Leu Pro Gly His Asp Val Val Leu 65 70 75 80 Asn Phe Ala Ala Glu Thr His Val Asp Arg Ser Ile Ala Asp Ser Ala 85 90 95 Glu Phe Leu Arg Thr Asn Val Gln Gly Val Gln Ser Leu Met Gln Ala 100 105 110 Cys Leu Thr Ala Gly Val Pro Thr Ile Val Gln Val Ser Thr Asp Glu 115 120 125 Val Tyr Gly Ser Ile Glu Ala Gly Ser Trp Ser Glu Asp Ala Pro Leu 130 135 140 Ala Pro Asn Ser Pro Tyr Ala Ala Ala Lys Ala Gly Gly Asp Leu Ile 145 150 155 160 Ala Leu Ala Tyr Ala Arg Thr Tyr Gly Leu Pro Val Arg Ile Thr Arg 165 170 175 Cys Gly Asn Asn Tyr Gly Pro Tyr Gln Phe Pro Glu Lys Val Ile Pro 180 185 190 Leu Phe Leu Thr Arg Leu Met Asp Gly Arg Ser Val Pro Leu Tyr Gly 195 200 205 Asp Gly Arg Asn Val Arg Asp Trp Ile His Val Ala Asp His Cys Arg 210 215 220 Gly Ile Gln Thr Val Val Glu Arg Gly Ala Ser Gly Glu Val Tyr His 225 230 235 240 Ile Ala Gly Thr Ala Glu Leu Thr Asn Leu Glu Leu Thr Gln His Leu 245 250 255 Leu Asp Ala Val Gly Gly Ser Trp Asp Ala Val Glu Arg Val Pro Asp 260 265 270 Arg Lys Gly His Asp Arg Arg Tyr Ser Leu Ser Asp Ala Lys Leu Arg 275 280 285 Ala Leu Gly Tyr Ala Pro Arg Val Pro Phe Ala Asp Gly Leu Ala Glu 290 295 300 Thr Val Ala Trp Tyr Arg Ala Asn Arg His Trp Trp Glu Pro Leu Arg 305 310 315 320 Lys Gln Leu Asp Ala Val Pro His Asp 325 <210> SEQ ID NO 35 <211> LENGTH: 342 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 35 Met Ala His Cys Leu Val Thr Gly Gly Ala Gly Phe Ile Gly Ser His 1 5 10 15 Leu Ala Gly Arg Leu Thr Ser Asp Gly His Arg Val Thr Val Leu Asp 20 25 30 Asp Leu Ser Gly Gly Ser Ala Ser Arg Val Pro Ala Gly Ala Asp Leu 35 40 45 Ile Val Gly Ser Val Thr Asp Ala Asp Leu Val Glu Arg Ala Phe Ala 50 55 60 Glu His Arg Phe Asp Arg Val Phe His Phe Ala Ala Phe Ala Ala Glu 65 70 75 80 Ala Ile Ser His Ser Val Lys Lys Leu Asn Tyr Gly Thr Asn Val Met 85 90 95 Gly Ser Ile Asn Leu Ile Asn Ala Ser Leu Gln Thr Gly Val Ser Phe 100 105 110 Phe Cys Phe Ala Ser Ser Val Ala Val Tyr Gly His Gly Glu Thr Pro 115 120 125 Met Arg Glu Thr Ser Ile Pro Val Pro Ala Asp Ser Tyr Gly Asn Ala 130 135 140 Lys Leu Val Ile Glu Arg Glu Leu Glu Val Thr Ala Arg Thr Gln Gly 145 150 155 160 Leu Pro Phe Thr Ala Phe Arg Met His Asn Val Tyr Gly Glu Trp Gln 165 170 175 Asn Met Arg Asp Pro Tyr Arg Asn Ala Val Ala Ile Phe Phe Asn Gln 180 185 190 Ile Leu Arg Gly Glu Pro Ile Thr Val Tyr Gly Asp Gly Gly Gln Val 195 200 205 Arg Ala Phe Thr Tyr Val Gly Asp Val Val Asp Val Val Cys Gln Ala 210 215 220 Pro Asp Val Glu Glu Ala Trp Gly Arg Ser Phe Asn Val Gly Ala Ala 225 230 235 240 Ser Thr Asn Thr Val Leu Glu Leu Ala Glu Ala Val Arg Val Ala Ala 245 250 255 Gly Val Pro Asp His Pro Ile Val His Leu Pro Ala Arg Asp Glu Val 260 265 270 Arg Val Ala Tyr Thr Ala Thr Asp Ser Ala Arg Lys Val Phe Gly Asp 275 280 285 Trp Ala Asp Thr Pro Leu Ala Asp Gly Leu Ala Arg Thr Ala Thr Trp 290 295 300 Ala Ala Gly Val Gly Pro Thr Glu Leu Arg Ser Ser Phe Asp Ile Glu 305 310 315 320 Ile Gly Gly His Gln Val Pro Glu Trp Ala Arg Leu Val Glu Lys Arg 325 330 335 Leu Gly Ser Ala Pro Arg 340 <210> SEQ ID NO 36 <211> LENGTH: 14071 <212> TYPE: DNA <213> ORGANISM: M. carbonacea <220> FEATURE: <221> NAME/KEY: misc_feature <222> LOCATION: (210)..(1271) <223> OTHER INFORMATION: ORF 31 (positive strandedness) <221> NAME/KEY: Unsure <222> LOCATION: (27)..(27) <223> OTHER INFORMATION: n at position 27 is unknown and represents a or g or c or t <221> NAME/KEY: misc_feature <222> LOCATION: (1432)..(5232) <223> OTHER INFORMATION: ORF 32 (negative strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (5550)..(6458) <223> OTHER INFORMATION: ORF 33 (positive strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (6458)..(7378) <223> OTHER INFORMATION: ORF 34 (positive strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (7363)..(8247) <223> OTHER INFORMATION: ORF 35 (negative strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (9384)..(10406) <223> OTHER INFORMATION: ORF 36 (negative strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (10406)..(11815) <223> OTHER INFORMATION: ORF 37 (negative strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (11815)..(12756) <223> OTHER INFORMATION: ORF 38 (negative strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (13059)..(13889) <223> OTHER INFORMATION: ORF 39 (negative strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (13923)..(14069) <223> OTHER INFORMATION: ORF 40 (negative strandedness) incomplete: C-terminus only (N-terminus is on next DNA contig; gap in between contig 6 and <400> SEQUENCE: 36 atgcggaggg ctccgaccac ggatatntcc tcaggcagga tccagccgat cgggcccggg 60 cgttctacga ggcgtttccc ggtgcgaccc ggatcctgga gctcggtgcg ctcgagggtg 120 cggacaccct cgcattggcc cgacagcccg gcaccagcat tctcgggctc gagggtcgcg 180 aggagaatct gcgtcgcgcc gagttcgtga tggaggtgca cggtgccacc aatgtggaac 240 tgcggatcgc cgacgtggag acgctcgact tcgccaccct ggggcggttc gacgccgtcc 300 tctgcgccgg cctgctgtat cacgtccggg agccctgggc gctgctcaag gacgccgccc 360 gggtttccgc cgggatctac ctgtcgaccc actactgggg cagttccgac gggctggaga 420 cgctggacgg gtattccgtc aagcacgtcc gtgaggagca cccggagcct caggcccgcg 480 ggctgagcgt ggacgtgcgc tggttggacc gggcctcgct gttcgcggcc ctggagaatg 540 ccggcttcgt cgagatcgag gtgctgcacg agcgcacgtc ggcggaggtc tgcgacatcg 600 tcgtggtcgg ccgtgcccgg ctgggtgcgc agatccgtcg attccgggag gatggcttcg 660 tcaacgccgg tccggtcttc gccgacgaca cgatcgcgcg gctcaaggcc ggtgccatcg 720 acctgatctc ccgcttcacc gagcacggcc acgtctcgga cgactactgg aactacgacg 780 tcgagaacga ggctccggtg ctctaccgga tccacaacct ggagaagcag gactgggccg 840 aacgcgagct gctgttccgc ccggaactgg cggagctggc cgccgcgttc gtcgggtcac 900 cggtcgtgcc caccgccttc gccctggtcc tgaaggagcc gaagagggcg gcgggcgtcc 960 cgtggcaccg cgaccgggcc aacgtcgcac cgcacacggt ctgcaacctg agcatctgcc 1020 tggacaccgc aggcccggag aacggctgtc tggaaggtgt tccgggctcc cacctgctgc 1080 ccgacgacat cgacgtcccg gagatccgcg acggtgggcc ccgagtgccc gtcccgtcga 1140 aggtgggcga cgtgatcgtg cacgatgtcc ggctcgtgca cggctcgggc cccaacccca 1200 gcgaccagtg gcgccgaacc atcgtcatcg agttcgcgaa tcccgcgatc tcactgccga 1260 gcctcccgtc ctgaccggcc ggacgcggat gccgcctgcg gcgaccccac cgggtgcctc 1320 cgcgaacctg cgcggcccgc gccgcgcacg tgcccacatt ggccgcggca gatcgactct 1380 gccgcggccg atgtcccggc cgatccgtcc ggtgtcccgg ctcgtcagct agctcttcga 1440 cgagagcagc ccggtcacgt ggtcggcgat ggccgtcacg gtgggttgct gccagaagac 1500 ggtggtgggt aggcccagcc cgagtcgttt ctccagccga cgccggatca ccaccgtcat 1560 cacggagtcc aggccctgat ccgcgagcgg gcgcctcgga tgcaggtcgt ccaccgcgag 1620 ccgcatctcg gtcgcgatct ggagacgaac ctcctccagc acccgttccc gcagctcggc 1680 cggcggcagg tcggccaggg acatcgccgg ctccgcgggg ctcgcctcct cggtgccggc 1740 cggggcgggc ggctcgtcga tcaccgggta gcgcaggccg ggcagcgcgg ccaggacccg 1800 gccgtccgcg gtggccacga gggcgtccac cgtgtcgggg cggtcctcgt cgagccgcac 1860 ctcgacgagg acggtctcgg gcggcgggcc actcgtcgcc acctcgtcga tctgcaccac 1920 catccgcaac tgcgggatgc cgggaaacgc cgccggcgcg atcgacatca ccgcgtccag 1980 caccgacgcc caggtggacg tctcggcggt gtggacgcgg gcccggagga caccgtaccc 2040 ggagagcagc tgatccaccg accaaccgaa accggtcgac gggacaccca cctcagcgag 2100 ccggcgatgg atcgacccgg ggtcggccgg ctccagtcgg tactgctcgg ggtccacgag 2160 ggtccgtccc gtgagcgccg ccgcagcacc gtcggccacg gtcgcgtcgg cgtggaccag 2220 ccacggcgga tcctcgccgg catccccgcc ggtggcccgg gaggcgagcc ggacggcgtc 2280 ggcctcctgg atgacctgga tctcccgcag gtccgccgtc atcagcgggt gccgcatcgc 2340 cacgtcggcc aggaccggtg gcacgccgtc ccgctcggcc gccgccagga acgtcaccac 2400 gagcacggcg gcgggcacga tctcgacgcc gttgaggctg tggctgcccg ggtagggccg 2460 gttggagtcg tccaggctgg tctcccacac ccgcaccgcg ctccccgcca ggctgcgccg 2520 cgcaccgagc agggtgtgcg aatcagggtc gtggccgcgc cccccgctcc gggagaccgg 2580 ggcgggatag tgccagtggc tgcggtgccg ccaccggtag accggcaggg tgaccagttc 2640 tcccgacggg tgcagggccg tccagtcgac gggcaccccg atgcagtgca tcccggcgag 2700 tgcggtcagg aagccgcgta cctcggactg atcgcggcga agcgtgacgc cgacgtacgc 2760 ctcgtcgtcg gaaccgccca acgtctcgtg gatcgagtgg gtgaccaccg gatgcggcga 2820 cacctccacg aaggcacgga agccatcggc gaacgccgcg gtgaccgcgg cggcgagccg 2880 cacgggctgg cgcaggttcc cggcccagta ggccccgtcg gccgtcatcg ccgcacgcgg 2940 gtcgtccagg gcggtggagt agacccggat ccgcgggctg tgcggcgtga agtcgacggc 3000 ggcggtcagc tcgtcgagca gcggatccat gtgcgggctg tggaacgcca cgtcggaggc 3060 caccctgcgg gtcaccagcc gctcggcatc ccactgggcg atcagcgcat ccagggccga 3120 gggatcgccg gaaaccaccg tcgacgacgg cgacgacgcg atggccgcca ccacgtcgct 3180 gcgaccggcc aaccgctcgg cgacctcctc gaacggcaac gacaccatgg ccatcgcgcc 3240 ctggcccgcg acgcgtcgca ggagcgccga tcgccggcag atcaaccggc cgccgtcctc 3300 caccgtcagc aggccggcgg tgaccgcggc ggcgatctca ccgacggagt ggccgatgac 3360 ggcgtccggg gtcacgcccc gcgaccgcca catcgcggcc agtcccagct gcatgacgaa 3420 gatcatcgtc tggatccggt cgaccgcgtc gaactcaccg tccagcaacg cctgccgtgg 3480 cgagaagccg atctcctcca ggaagacggc ttccagggag tcgaccaccc ctgcgaacgc 3540 cggttcggtg acgagtagtt cccggcccat ccccgcccac tgggaaccgt ggccggaaaa 3600 gacccagagg agcttcggag ggtccccgag cggcgatccc gtgaccacgc cgtcgaccgg 3660 ttcgcccgcg gccaggccgc gtagcgcggc accgagaccg tccgcgtcgg cggccacggc 3720 gaccgcccgg tacgcgagat gcgaacgccg catcgccagg gtgtgtccca cggaggccag 3780 gtcggcgtcc cgggagagcc agccggcgag cgccgacgcc tgctcgccca acgacgccgc 3840 cgaggacgcg gagaccggga acagggactc accggtcagc gggctgcgct cccgtcgcgt 3900 gcgcggtggg gcctgcccga gcacgacgtg ggccaccgtc ccgccgtacc cgaagccgga 3960 cacgccggcc cggcgcggtc tcccgcgatc gggccaggac tggtgacggg tcaccacgcg 4020 gacgttcagc gcgtcccagg cgatggccgg atcgacgtcg gtgacgaccg gggtggccgg 4080 tatctcggcg cggtccaggg cgagcaccgc cttgatcact ccggcgatgc ccgcggcgcc 4140 ttccagatgg ccgatgttgg ccttgaccga accgatcagg cagggctcgc cgtcggcgcg 4200 agcgtgcccg tacacggcac cgatcgcggc ggcctccatc gggtcgccga gcggggtgcc 4260 cgtgccgtgc gcctcgacgt agtcgaccga gccgggcgct atcccgccgc tgcgcagggc 4320 ccgttccatc acgtgctcct gggcctgccc gcacggggcc atgatcccgt tggtgcgacc 4380 gtcctggttc acggcgctgc cgtgcagcac cgccaacacc cggtctccgt cgcgctcggc 4440 atcggcgagg agcttcagca cgacgacgcc gcagccctcg ccgcggccgt acccgtcggc 4500 ggtggcgtcg aaggacttgc tccgcccgtc cggggccagc gcacccgcgg cgccgagagt 4560 gatggactgg cctggggaga cgatgagatt gacgccgccg accagagcca ccgtgctctc 4620 gcccagccgc aggctctgcg cggcgaggtg cagggccacc agcgaggccg aacacgcggt 4680 gtccacggtg agactcggcc cccgcaggtc caggacgtgg gagacgcggt tggagagcgc 4740 gcaggcggcg gcgccgatcc cggtccaggc gtcgatgtac gggaggttct cgagctggtg 4800 ggcaccgtag tcgtaggtgc aggcaccggc gaagacaccg gtgtccgtgc cggccagctc 4860 gcgcggtgcg atgcccgcgt gttccagggc ctgccaggcc acctccagca gcagtcgttg 4920 ctggggatcc atcagctcgg cctcgcgtgg cgagatgccg aagaagtcgg catcgaagcc 4980 gtcgatctca ttcaggaagc tgcccgaacg gttagcccgg cgtacggcgt tctcgaactc 5040 gggccccagg tcccggtacg gctcccatcg gctggccggc acttcgccgg tggtgttgtg 5100 cccgccggcg agcaggtccc agaagccgtc cggggaattg acgtcgccgg ggaaccggca 5160 accgatgccg atgaccgcga ccgatggaga ggcccccgcg agcggcttga ttgctttcgg 5220 ttccacgaac atcccctgtt gtcgattgcc tgaacggacg gtcgggcggg catcgcctcg 5280 ggcgggttac gccccgcggc gcatcgagga ctgccgcgac cgggccggtc gccgcgactc 5340 cgggcggcgc accgcgcgtc agactccata tcccatacgg cgatccaacg aactctacaa 5400 acggtctata tgacagtgat atagaacttc tcctagacat tttctgacgc gcccccggca 5460 gggcctcccg cgaggcgatc cggaaccgga ctcccgcccg gtcgtgcggg tgcccggctt 5520 gcggtggttg gtgtggacgc gccccgggca tgggcggcgg gcgacgccgc cgatccgatg 5580 accatcggcg aatagggaaa gacctagaat ccgccggcga gatgtcggat cggcagccga 5640 gggctcgatc ggtcacgatg gaccgtgcga ggatcacgcg gatgttgtcg aaggcacggg 5700 agtcattgac gatgaccgaa tacggtgcca tcgcgctgga tgtcggtggg gtcatctatt 5760 acgatgagcc gttcgagctg gcgtggctgc aagcgacgta cgatctgctc cgatctgacg 5820 acccggcgat cacccggtcc gttttcatcg agcatgtcga gcgtttctat cactccccgg 5880 acgacggagc cgcaggccgg acgctgttgc attcgccggc cgccgcccga gcctgggcgc 5940 agattcgccg ggcctggcac gaactcgccc aggagatgcc gggcgcggtc cgggcggcgg 6000 tgacgctggc ccgcgaggtc ccgacggtga tcgtcgccaa ccagcccccg gagtgcgcgc 6060 gggtgctgga tgcctgggga ctgacagagg cctgcgcggg cgtcttcctc gattcgctcg 6120 tgggcgtcgc caagccggat ccggcgctgc tgggaatcgc cctggaacac ctcggtgtcg 6180 cccccgccga cctgctggtc gtcggcaacc ggcacgacca cgacgtcctg ccggcgcggg 6240 cgctcggctg cccggtcgcc ttcgtccgcg cggaccccgg ctaccggccg ccgtccggcg 6300 tccaccccga tctgatccgg gcgtacacgt cgctccgcgc cgtccggacc gcgccgccgg 6360 ccggtgacga cgaacgggtg tccgtcgtcg ccacgctggc ggccctggct cgttcctcgg 6420 ccacgggcct gcgcccggtc actcgcgccg agtcgtcgtg acggcagccc aggtgaggag 6480 atgcccgacg gccgtcatcg gcgccaccgg gttcatcggg tcccggctcg tggcccaact 6540 gacccgcgcg gggcacccgg tcgcccgctt caaccaggcg cacccgccgg tggtcgacgg 6600 gcgcccggct gccggcctgt gcgacgccga gatcgtactg ttcctcgccg cacggttgag 6660 cccggcgctc gccgagcgcc atccggaact gatcgtcgcc gagcgcaggc tgctcgtcga 6720 cgtcctgacg gccctgcggc actccgcccc cttcccggtg ttcgtactgg ccagctcagg 6780 cggcacggtg tactcgccga acgcgtgccc gccgtacgac gaatcggcgt tgaccaggcc 6840 cacgtcggcg tacgggcgcg ccaagctcgg gctggaacgc gaactgttgg gtcacgccga 6900 ccatgtccgt cccgtgatcc tgcggctcag taacgtctat gggcccggcc agcgcccggc 6960 gcacggctac ggcgtgctgt cgcactggct ggacgccgcg gccaggcggc agccgatccg 7020 ggtcttcggt gatccggagg tggtccgcga ctacgtgcac gtggacgacg tcgccgagat 7080 cctcaaggcc gtgcaccgcc gtacggtcac taccggtccg gagggaatcc cgaccgtgtt 7140 gaacgtcggc tcaggggcgc ccacctccct ggccgatctg ctcgcggtgg tgtcgacagt 7200 ggtcgaccag cggatcgagg tgatctggga aggcggtcgc cagttcgaca gaggtggcaa 7260 ctggctggac tcctcgttgg cacacgagac cctcggctgg cgggccagga tcggtctgac 7320 ggacggcgta cgtgaatgct gggaacacgt gctcgcgcat cagaccgccg ccgagcgatg 7380 atcacgcccc cacctcagca ggagctgatc atgaaggacg ccccacgcgg tcacggcacc 7440 gtagagacac gccagcggca ggcgcaggcg ggactccggc gcggtgcgat gccggtccca 7500 ttccttccgg aaccccgccc acgcctggcc acgccgtgac tcgcaccgac cctgccagta 7560 cgcccgccgc agcaggtagc gcagggtcag ccgggcggga tccacgtcgt gggtcacgga 7620 gcagtccggc aggagttgct cccgggcccc cgcgtccttc atgagcttga cgaacgtggt 7680 gtcctcgccg gactggaggt tgccacccgt ccggctcaac gcgagatcga agtcgaggcc 7740 cttggcatgg gcgaaggcgg tgtccaccgc catacaggca ccccagatct tgatctcgcg 7800 gtcgtcccgg tgccagccga gcaggtggaa ctgaccggac gtcacgtacc agggcagggg 7860 gcgtggtgga cgggcgagcc gagtgccgac gacgtgggtg cccgcgcgca ggctctcgcg 7920 gaccgccgtg acggccttgg cgtccagccg cacgtcgtcg tcgacgaaca tcacgtggtg 7980 attcggccac cgggccaaca tgagattccg cgaggcggac aggccgccgg tggcgccgag 8040 aacgcgcatc gttccgccgg cggcatccac ctcggccgcg accgactccg cctccggagt 8100 gctcggccgg tccaactgca cgaagtactc gtcaccggac agttgggcca ggttgtggtg 8160 taggtgctta cgcacattct cgacactgaa tgcgcatata gccactacca tcggataatc 8220 ggagggccct tttttcttcg tttccatgag acctcgaatc gtccctgccg atgggtcatg 8280 gggttgcacg ggctggtttc cgttccgttc agtcgagcct ttcccggcaa acctccgggg 8340 gccaggtccc caccgaaagg atgcccatcg agtacacctt ggcgatgagc gcgggccgat 8400 tgggcacacg tagccgctgc aacaacttgc tgacgtggta ttcgacgccc tggcggctca 8460 ggtagacctt gttcgcgata tgcacggtgc gctcacccgc tgcgatgctt tcgatgatgc 8520 gggcatccaa ctcggagagg ggaagcttca gacccacagt ttcatctcca atccctacca 8580 tgacttgatt cgcaagacga gtcgatggaa cggcgcactg cgtcacggtg tcgtcttcca 8640 acgcttaagt caggccgaac cggccgtgaa tcagccggac gcaggcgtgc tcaacccgat 8700 ggaggccacc gaaccggcgg ccggccgacg ttacgcggtg ctgggtccgg acattcggca 8760 gaaagcgtcg cgcgaccagc ggattccgag acgattggtt gccgggtgcg gcaagccggc 8820 gcagccggtt cacccgccat cggagttgac ctgaaggtgt cggaaaacct agctgacagt 8880 aaacatcccg tagcagtcgc acccccgctt tgcctgcgat cgatacgtag gtcatccgtg 8940 tggccactcc cagaactgac ctaacgtggc agtagtgtaa ccgaaagttg cacgtatcgc 9000 ctgccccgat cgggtaaatg atcgacggtt gtcgctctct gatcggaatt gacccatgcg 9060 ggtccatcga tgcctgcgtt cgggggtacg ccctgctccg ggccgcgtgc ggggtggtcc 9120 agcggcttgg ccgcaggggc caccgatgga tcgaacgcta ccctgagtcg cggagtgaca 9180 actatgagct cgtgctgagg cttagtgaaa cacctagtct ccggggcctg gatcgtccat 9240 tgagctgggt gttttcgcca ttgatgaccc ctagagggct aaggcggctt tcgagtcgtc 9300 cgaacgcttc gcccgttctg cggggcccct caagggggcc ggcgcgggcc tccccggtgc 9360 ggccgtggtc tcgccgcgcc tcagacctcg tggggcagcc gcacgcgcgg ttgccctgat 9420 ttgacccgaa tttcatcgat caggcgagcc cgggcgcaga tctccaccgt ctcggcggca 9480 cgctgacccg cgacggcgac ggctccgacg aaggcacgca gcgtgttggc gaactgatcc 9540 tccggcggaa gggtcagttc ccgcctgacc tcctcgtgtt ccagcctgat gacgggccgt 9600 cgcgtggtag ggggcgtgta ggcgcgttcg acgacgatcc gcccctcgct gccccacagc 9660 acatagtccg accggtacgc gtgctcgaag ccgaaggtca actgggcggt ccggccgtcc 9720 ggagtggaca gcagggcgct gccggacacg tcgacgccgc gttcggggtc cagtttcagc 9780 gtcgcgccga cgacctccag ctcgggaccg aggaagaggc gggcggcgct cagcggatac 9840 atgcccgtgt cgagcagtgc cccaccgccc agatccggcc ggtagcggat gtcgtccggg 9900 ccgaggggag ggaatccgaa cgcggcgttg acctcgcgca gttcgccgat ctcgccgccc 9960 tccagcagat ccaggaccgc gctgtgcaga ccatgccgga gaaaggtcag attctccatc 10020 agggcgaggc cgagggaccg ggcggtttcc accatcgcca cggtgtgggc gtaccgtgtg 10080 gtcaacggct tctccaccag gacgtgctta ccggcctcaa gcgctcggcc gacccattcg 10140 tggtgcaacc cggcgggcag gggaatatac accgcgtcca cgtccggccg gctcagcagc 10200 gagcggtaat cggcggccgc cgcgcagccg aactggccgg cgaacgcacg ggccttcgcc 10260 tgttctcggg ccgcgacggc gacgagttcg gtcgtcggct cacgcaggat cgccggcagg 10320 gtccgccgcc gtgcgatgtc ggcacatccc agtacgccga accggatcgg atcgttcacc 10380 accgctcgct ggggcacctc actcaccaca ggctgtgcag gcaggcaagc agactgcgtg 10440 cttcgacgtt gaggtagtag ccgtgccgca gcagcgtggc gagctgatgg acggcgaccc 10500 agcagaaggt gtccggcacc gccggaaggt cgtcgccgac ctccaccagc atgtagcggt 10560 tctcggcgcg gtagaaccgg ccgccctcct ccgcgaggac ggtgtcgaac aggatccgct 10620 cgctcggggc gttgagtatc aggtcgagga aaggaggccg ctgttcggca tccgaactcg 10680 gcgtgcactg gaccgtgggt cccatctcca tggcatccag cagccccacc tggaaccgag 10740 cgtgcaccag cacgtgcgcc accccattga tgatcctgag cgcgaatgcg atgatccccc 10800 gctgccgcgg atgcagcaac ggttgactcc actcggtgac ctcacggttg ttgatgcgta 10860 ccgtgacccc gacgatgctg aagtgtcgcc cgtcctcgcg ggagatctca tacggcgtgt 10920 gccgccagcc gcgcaggtcg cggagcgaga tccggctcac cgccagctcg tgccggctct 10980 tggcctcggt gaaccagctc agcaccgcct ccgtggagcg gtgcgacgga ccctcgccgc 11040 tggcggaccg ggccagcgcg gcggccatgg cggagttcgc ggggaccact cccgaaccgg 11100 cgaagaaggt cgacggcagg caggacagca cggaacgggt gtccatgttg accagaacgt 11160 cgatgcgcag gagccgccga acctcgtgca gcggcagcca gtagtggtag tcggagggcg 11220 gcacgtcgtc gaccagcacg accatgttgc ggttccgctt gtgcaggaac caggagccct 11280 gctcggactg caacacgtcg accaggaccc ggccggcgcc cggccgggtg aagtattcga 11340 gatacctcgt gccgccgccc ccgtggaccc gcgtgtagtt gctgcgggtc gcctgcacgg 11400 tgggcgacag ctgcatcatg ttgatgttgc ccggctcgac cttcgcctgc aacaggcagt 11460 gcgggacgcc gtcgacgagc ttcatcaaca tgccgagaat cccgatctcc ggctgattga 11520 tgatcggctg gtaccactcc gccaccgcac cgtacgtcgt tcgcacgtgc acgccctcga 11580 ccacgaagaa gcggccgctc tcgtgcgcca ggttgccggt ggtctcgtcg aacgcccaac 11640 cacgcagctc gtccagccgg atccgctcca cctcgcagga ggtcatcgtc gaccgttcgg 11700 caagccagga ccggaaggcg gggctcaccc cgcccgccgc cggctcccac cactccatcg 11760 ggtcatcccc ggacgtcatg agcttgccct cggacaacgc gggaagctcg ctcaccacag 11820 atcggccagc tcgcgaatga cgccgccctc actgcactgc ggacccgaca gcagacggag 11880 atcgctgtcg accatcatcc cgaccagctc ctcgaagtag accgtcggct tccagccgag 11940 gcgctgctgg gccttcgtcg cgtcggcgca gagcaggtcg acctcggccg gacgctggag 12000 cgcctcgtcg aggacgacgt ggtcacgcca gtccaggtcg acgtgggcga aggccagctc 12060 gaccagctcc cgcacgctgt gcgcgatccc ggtacccagg acgtagtcgt caggctcgtc 12120 ctgggcgagc atcatcgaca tgccgcggac gtagtcgccg gcgaaacccc agtcccgctc 12180 ggccatcaga ttgcccagtc gcaacgagtc ccggagcccc agtttcacgg ccgccgcgcc 12240 cagcgacacc tttcgcgtca cgaactccgg tccgcggatg ggtgattcat ggttgaagag 12300 catgccggag accgcgtaca tgccgtacga ctcgcggtag ttctgcaccg tgtagtggcc 12360 gaacactttg gcgacgccgt acggactgcg tgggtggaag ggcgtcagct cattctgggg 12420 ggtctcccgc accttgccga acatctcgga cgacgaggcc tgatagaagc gtggccgact 12480 cgggccgggg gtacgcgagc tggtgatgcc tgcgacgatc cggacggctt cgagcatgcg 12540 caccacgccc gtcccggtga tctccgcggt ggtgttgggc tgccgccacg aggtggggac 12600 gtaggacagt gcgccgaggt tgtagatctc gtccggccga accctgtcca ccgccgagat 12660 caggctcgac tggtccatca ggtcgccgtt gacaagccga acgtcggggt gcacctgccg 12720 gccgcagcgc gcgctcggcg aattctggcc ccgcaccatt ccatagacgt catatccagc 12780 ggccaggagg tgccgcgcaa gataagtacc gtcctgaccg gtaatccctg tgatcagcgc 12840 tcgcctagtc aggataatct ccagcccctg tgaccaaccc tcgatgtgat cgcgtcgagg 12900 gatggcgaac taccgggttg ccccgaggaa aggcatgtcc cgttgccgtg actcacggta 12960 ctggaaaatg gagcagggat cacccttctc gaatgcaata tagggagctc actagagggt 13020 gcagccgtgc gcgaagaacc ggcaatccgt actgcttacg ggtgggccgg cccggcgtcg 13080 ttgaggtcgg ccgccacgaa cagggccgcc gctcgtgtgc ccaccctcag ttttcgccgg 13140 gggcgtgccg ccccgatcgg cggaacccgg cggccccgca cgatcaccgc gatcaccgcg 13200 gccggtcgaa gcaccgaagc gacggcggtg agcattccgt gcaggccccg accggcatga 13260 ccgccggtcc actccgtctg ttcgccgtac gggacgtccg tcacgacgag atccggttcg 13320 ctgcccccga ccgccgaggc cagcgccgtc gggtcgaaga cgtcggcctg tcggacggcg 13380 tacgggaggg ggccgccggc agcctcgagc cggcggctca gccgtccggc cgcggccgcg 13440 gcctcggcgt agccgggctt gtcgaagcgt cgagcccgct cggtcaactc cgcagcccgt 13500 gccgccaggc caccctcggc gagcaggccg acattggctg ccgccaatgt cagggccgcg 13560 tcggcgtcga tgtccgaggc cagcagcctc gcgatcgacc gccggtgcag aatcccgagc 13620 acggtcagca ggtaaccgct gccgcagcac ggatcccaca ggaccgccgg gtcggtgccg 13680 ccgcgcacgt cgagggcgtg ctggaacacc tcggacgcga gccgcacggg gaaggcggga 13740 aagcccgggg cggaccggag aaccgccccg cttgcaagat cgccgtagtc gtcccgtgtc 13800 gtctcgtacc cgatagccca ccctgctccg atctctccgc acgccagggt agcaatgggt 13860 gtggccggtg ccgcagcacc gctacgcacc cgccgggaac gagcttgagg ccggcccgct 13920 catgcctcga cgtcgcaacc gtcctggtcc ctgatccact ggtaggccct ggcgattccc 13980 tcggcgaggg ctgtccgtgc ggtccagttc aactcccggc cggcccgggt cacatcgagg 14040 gcggaatgct ggagctcgcc gagacgggcg g 14071 <210> SEQ ID NO 37 <211> LENGTH: 354 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 37 Met Glu Val His Gly Ala Thr Asn Val Glu Leu Arg Ile Ala Asp Val 1 5 10 15 Glu Thr Leu Asp Phe Ala Thr Leu Gly Arg Phe Asp Ala Val Leu Cys 20 25 30 Ala Gly Leu Leu Tyr His Val Arg Glu Pro Trp Ala Leu Leu Lys Asp 35 40 45 Ala Ala Arg Val Ser Ala Gly Ile Tyr Leu Ser Thr His Tyr Trp Gly 50 55 60 Ser Ser Asp Gly Leu Glu Thr Leu Asp Gly Tyr Ser Val Lys His Val 65 70 75 80 Arg Glu Glu His Pro Glu Pro Gln Ala Arg Gly Leu Ser Val Asp Val 85 90 95 Arg Trp Leu Asp Arg Ala Ser Leu Phe Ala Ala Leu Glu Asn Ala Gly 100 105 110 Phe Val Glu Ile Glu Val Leu His Glu Arg Thr Ser Ala Glu Val Cys 115 120 125 Asp Ile Val Val Val Gly Arg Ala Arg Leu Gly Ala Gln Ile Arg Arg 130 135 140 Phe Arg Glu Asp Gly Phe Val Asn Ala Gly Pro Val Phe Ala Asp Asp 145 150 155 160 Thr Ile Ala Arg Leu Lys Ala Gly Ala Ile Asp Leu Ile Ser Arg Phe 165 170 175 Thr Glu His Gly His Val Ser Asp Asp Tyr Trp Asn Tyr Asp Val Glu 180 185 190 Asn Glu Ala Pro Val Leu Tyr Arg Ile His Asn Leu Glu Lys Gln Asp 195 200 205 Trp Ala Glu Arg Glu Leu Leu Phe Arg Pro Glu Leu Ala Glu Leu Ala 210 215 220 Ala Ala Phe Val Gly Ser Pro Val Val Pro Thr Ala Phe Ala Leu Val 225 230 235 240 Leu Lys Glu Pro Lys Arg Ala Ala Gly Val Pro Trp His Arg Asp Arg 245 250 255 Ala Asn Val Ala Pro His Thr Val Cys Asn Leu Ser Ile Cys Leu Asp 260 265 270 Thr Ala Gly Pro Glu Asn Gly Cys Leu Glu Gly Val Pro Gly Ser His 275 280 285 Leu Leu Pro Asp Asp Ile Asp Val Pro Glu Ile Arg Asp Gly Gly Pro 290 295 300 Arg Val Pro Val Pro Ser Lys Val Gly Asp Val Ile Val His Asp Val 305 310 315 320 Arg Leu Val His Gly Ser Gly Pro Asn Pro Ser Asp Gln Trp Arg Arg 325 330 335 Thr Ile Val Ile Glu Phe Ala Asn Pro Ala Ile Ser Leu Pro Ser Leu 340 345 350 Pro Ser <210> SEQ ID NO 38 <211> LENGTH: 1267 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 38 Met Phe Val Glu Pro Lys Ala Ile Lys Pro Leu Ala Gly Ala Ser Pro 1 5 10 15 Ser Val Ala Val Ile Gly Ile Gly Cys Arg Phe Pro Gly Asp Val Asn 20 25 30 Ser Pro Asp Gly Phe Trp Asp Leu Leu Ala Gly Gly His Asn Thr Thr 35 40 45 Gly Glu Val Pro Ala Ser Arg Trp Glu Pro Tyr Arg Asp Leu Gly Pro 50 55 60 Glu Phe Glu Asn Ala Val Arg Arg Ala Asn Arg Ser Gly Ser Phe Leu 65 70 75 80 Asn Glu Ile Asp Gly Phe Asp Ala Asp Phe Phe Gly Ile Ser Pro Arg 85 90 95 Glu Ala Glu Leu Met Asp Pro Gln Gln Arg Leu Leu Leu Glu Val Ala 100 105 110 Trp Gln Ala Leu Glu His Ala Gly Ile Ala Pro Arg Glu Leu Ala Gly 115 120 125 Thr Asp Thr Gly Val Phe Ala Gly Ala Cys Thr Tyr Asp Tyr Gly Ala 130 135 140 His Gln Leu Glu Asn Leu Pro Tyr Ile Asp Ala Trp Thr Gly Ile Gly 145 150 155 160 Ala Ala Ala Cys Ala Leu Ser Asn Arg Val Ser His Val Leu Asp Leu 165 170 175 Arg Gly Pro Ser Leu Thr Val Asp Thr Ala Cys Ser Ala Ser Leu Val 180 185 190 Ala Leu His Leu Ala Ala Gln Ser Leu Arg Leu Gly Glu Ser Thr Val 195 200 205 Ala Leu Val Gly Gly Val Asn Leu Ile Val Ser Pro Gly Gln Ser Ile 210 215 220 Thr Leu Gly Ala Ala Gly Ala Leu Ala Pro Asp Gly Arg Ser Lys Ser 225 230 235 240 Phe Asp Ala Thr Ala Asp Gly Tyr Gly Arg Gly Glu Gly Cys Gly Val 245 250 255 Val Val Leu Lys Leu Leu Ala Asp Ala Glu Arg Asp Gly Asp Arg Val 260 265 270 Leu Ala Val Leu His Gly Ser Ala Val Asn Gln Asp Gly Arg Thr Asn 275 280 285 Gly Ile Met Ala Pro Cys Gly Gln Ala Gln Glu His Val Met Glu Arg 290 295 300 Ala Leu Arg Ser Gly Gly Ile Ala Pro Gly Ser Val Asp Tyr Val Glu 305 310 315 320 Ala His Gly Thr Gly Thr Pro Leu Gly Asp Pro Met Glu Ala Ala Ala 325 330 335 Ile Gly Ala Val Tyr Gly His Ala Arg Ala Asp Gly Glu Pro Cys Leu 340 345 350 Ile Gly Ser Val Lys Ala Asn Ile Gly His Leu Glu Gly Ala Ala Gly 355 360 365 Ile Ala Gly Val Ile Lys Ala Val Leu Ala Leu Asp Arg Ala Glu Ile 370 375 380 Pro Ala Thr Pro Val Val Thr Asp Val Asp Pro Ala Ile Ala Trp Asp 385 390 395 400 Ala Leu Asn Val Arg Val Val Thr Arg His Gln Ser Trp Pro Asp Arg 405 410 415 Gly Arg Pro Arg Arg Ala Gly Val Ser Gly Phe Gly Tyr Gly Gly Thr 420 425 430 Val Ala His Val Val Leu Gly Gln Ala Pro Pro Arg Thr Arg Arg Glu 435 440 445 Arg Ser Pro Leu Thr Gly Glu Ser Leu Phe Pro Val Ser Ala Ser Ser 450 455 460 Ala Ala Ser Leu Gly Glu Gln Ala Ser Ala Leu Ala Gly Trp Leu Ser 465 470 475 480 Arg Asp Ala Asp Leu Ala Ser Val Gly His Thr Leu Ala Met Arg Arg 485 490 495 Ser His Leu Ala Tyr Arg Ala Val Ala Val Ala Ala Asp Ala Asp Gly 500 505 510 Leu Gly Ala Ala Leu Arg Gly Leu Ala Ala Gly Glu Pro Val Asp Gly 515 520 525 Val Val Thr Gly Ser Pro Leu Gly Asp Pro Pro Lys Leu Leu Trp Val 530 535 540 Phe Ser Gly His Gly Ser Gln Trp Ala Gly Met Gly Arg Glu Leu Leu 545 550 555 560 Val Thr Glu Pro Ala Phe Ala Gly Val Val Asp Ser Leu Glu Ala Val 565 570 575 Phe Leu Glu Glu Ile Gly Phe Ser Pro Arg Gln Ala Leu Leu Asp Gly 580 585 590 Glu Phe Asp Ala Val Asp Arg Ile Gln Thr Met Ile Phe Val Met Gln 595 600 605 Leu Gly Leu Ala Ala Met Trp Arg Ser Arg Gly Val Thr Pro Asp Ala 610 615 620 Val Ile Gly His Ser Val Gly Glu Ile Ala Ala Ala Val Thr Ala Gly 625 630 635 640 Leu Leu Thr Val Glu Asp Gly Gly Arg Leu Ile Cys Arg Arg Ser Ala 645 650 655 Leu Leu Arg Arg Val Ala Gly Gln Gly Ala Met Ala Met Val Ser Leu 660 665 670 Pro Phe Glu Glu Val Ala Glu Arg Leu Ala Gly Arg Ser Asp Val Val 675 680 685 Ala Ala Ile Ala Ser Ser Pro Ser Ser Thr Val Val Ser Gly Asp Pro 690 695 700 Ser Ala Leu Asp Ala Leu Ile Ala Gln Trp Asp Ala Glu Arg Leu Val 705 710 715 720 Thr Arg Arg Val Ala Ser Asp Val Ala Phe His Ser Pro His Met Asp 725 730 735 Pro Leu Leu Asp Glu Leu Thr Ala Ala Val Asp Phe Thr Pro His Ser 740 745 750 Pro Arg Ile Arg Val Tyr Ser Thr Ala Leu Asp Asp Pro Arg Ala Ala 755 760 765 Met Thr Ala Asp Gly Ala Tyr Trp Ala Gly Asn Leu Arg Gln Pro Val 770 775 780 Arg Leu Ala Ala Ala Val Thr Ala Ala Phe Ala Asp Gly Phe Arg Ala 785 790 795 800 Phe Val Glu Val Ser Pro His Pro Val Val Thr His Ser Ile His Glu 805 810 815 Thr Leu Gly Gly Ser Asp Asp Glu Ala Tyr Val Gly Val Thr Leu Arg 820 825 830 Arg Asp Gln Ser Glu Val Arg Gly Phe Leu Thr Ala Leu Ala Gly Met 835 840 845 His Cys Ile Gly Val Pro Val Asp Trp Thr Ala Leu His Pro Ser Gly 850 855 860 Glu Leu Val Thr Leu Pro Val Tyr Arg Trp Arg His Arg Ser His Trp 865 870 875 880 His Tyr Pro Ala Pro Val Ser Arg Ser Gly Gly Arg Gly His Asp Pro 885 890 895 Asp Ser His Thr Leu Leu Gly Ala Arg Arg Ser Leu Ala Gly Ser Ala 900 905 910 Val Arg Val Trp Glu Thr Ser Leu Asp Asp Ser Asn Arg Pro Tyr Pro 915 920 925 Gly Ser His Ser Leu Asn Gly Val Glu Ile Val Pro Ala Ala Val Leu 930 935 940 Val Val Thr Phe Leu Ala Ala Ala Glu Arg Asp Gly Val Pro Pro Val 945 950 955 960 Leu Ala Asp Val Ala Met Arg His Pro Leu Met Thr Ala Asp Leu Arg 965 970 975 Glu Ile Gln Val Ile Gln Glu Ala Asp Ala Val Arg Leu Ala Ser Arg 980 985 990 Ala Thr Gly Gly Asp Ala Gly Glu Asp Pro Pro Trp Leu Val His Ala 995 1000 1005 Asp Ala Thr Val Ala Asp Gly Ala Ala Ala Ala Leu Thr Gly Arg 1010 1015 1020 Thr Leu Val Asp Pro Glu Gln Tyr Arg Leu Glu Pro Ala Asp Pro 1025 1030 1035 Gly Ser Ile His Arg Arg Leu Ala Glu Val Gly Val Pro Ser Thr 1040 1045 1050 Gly Phe Gly Trp Ser Val Asp Gln Leu Leu Ser Gly Tyr Gly Val 1055 1060 1065 Leu Arg Ala Arg Val His Thr Ala Glu Thr Ser Thr Trp Ala Ser 1070 1075 1080 Val Leu Asp Ala Val Met Ser Ile Ala Pro Ala Ala Phe Pro Gly 1085 1090 1095 Ile Pro Gln Leu Arg Met Val Val Gln Ile Asp Glu Val Ala Thr 1100 1105 1110 Ser Gly Pro Pro Pro Glu Thr Val Leu Val Glu Val Arg Leu Asp 1115 1120 1125 Glu Asp Arg Pro Asp Thr Val Asp Ala Leu Val Ala Thr Ala Asp 1130 1135 1140 Gly Arg Val Leu Ala Ala Leu Pro Gly Leu Arg Tyr Pro Val Ile 1145 1150 1155 Asp Glu Pro Pro Ala Pro Ala Gly Thr Glu Glu Ala Ser Pro Ala 1160 1165 1170 Glu Pro Ala Met Ser Leu Ala Asp Leu Pro Pro Ala Glu Leu Arg 1175 1180 1185 Glu Arg Val Leu Glu Glu Val Arg Leu Gln Ile Ala Thr Glu Met 1190 1195 1200 Arg Leu Ala Val Asp Asp Leu His Pro Arg Arg Pro Leu Ala Asp 1205 1210 1215 Gln Gly Leu Asp Ser Val Met Thr Val Val Ile Arg Arg Arg Leu 1220 1225 1230 Glu Lys Arg Leu Gly Leu Gly Leu Pro Thr Thr Val Phe Trp Gln 1235 1240 1245 Gln Pro Thr Val Thr Ala Ile Ala Asp His Val Thr Gly Leu Leu 1250 1255 1260 Ser Ser Lys Ser 1265 <210> SEQ ID NO 39 <211> LENGTH: 303 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 39 Met Gly Gly Gly Arg Arg Arg Arg Ser Asp Asp His Arg Arg Ile Gly 1 5 10 15 Lys Asp Leu Glu Ser Ala Gly Glu Met Ser Asp Arg Gln Pro Arg Ala 20 25 30 Arg Ser Val Thr Met Asp Arg Ala Arg Ile Thr Arg Met Leu Ser Lys 35 40 45 Ala Arg Glu Ser Leu Thr Met Thr Glu Tyr Gly Ala Ile Ala Leu Asp 50 55 60 Val Gly Gly Val Ile Tyr Tyr Asp Glu Pro Phe Glu Leu Ala Trp Leu 65 70 75 80 Gln Ala Thr Tyr Asp Leu Leu Arg Ser Asp Asp Pro Ala Ile Thr Arg 85 90 95 Ser Val Phe Ile Glu His Val Glu Arg Phe Tyr His Ser Pro Asp Asp 100 105 110 Gly Ala Ala Gly Arg Thr Leu Leu His Ser Pro Ala Ala Ala Arg Ala 115 120 125 Trp Ala Gln Ile Arg Arg Ala Trp His Glu Leu Ala Gln Glu Met Pro 130 135 140 Gly Ala Val Arg Ala Ala Val Thr Leu Ala Arg Glu Val Pro Thr Val 145 150 155 160 Ile Val Ala Asn Gln Pro Pro Glu Cys Ala Arg Val Leu Asp Ala Trp 165 170 175 Gly Leu Thr Glu Ala Cys Ala Gly Val Phe Leu Asp Ser Leu Val Gly 180 185 190 Val Ala Lys Pro Asp Pro Ala Leu Leu Gly Ile Ala Leu Glu His Leu 195 200 205 Gly Val Ala Pro Ala Asp Leu Leu Val Val Gly Asn Arg His Asp His 210 215 220 Asp Val Leu Pro Ala Arg Ala Leu Gly Cys Pro Val Ala Phe Val Arg 225 230 235 240 Ala Asp Pro Gly Tyr Arg Pro Pro Ser Gly Val His Pro Asp Leu Ile 245 250 255 Arg Ala Tyr Thr Ser Leu Arg Ala Val Arg Thr Ala Pro Pro Ala Gly 260 265 270 Asp Asp Glu Arg Val Ser Val Val Ala Thr Leu Ala Ala Leu Ala Arg 275 280 285 Ser Ser Ala Thr Gly Leu Arg Pro Val Thr Arg Ala Glu Ser Ser 290 295 300 <210> SEQ ID NO 40 <211> LENGTH: 307 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 40 Val Thr Ala Ala Gln Val Arg Arg Cys Pro Thr Ala Val Ile Gly Ala 1 5 10 15 Thr Gly Phe Ile Gly Ser Arg Leu Val Ala Gln Leu Thr Arg Ala Gly 20 25 30 His Pro Val Ala Arg Phe Asn Gln Ala His Pro Pro Val Val Asp Gly 35 40 45 Arg Pro Ala Ala Gly Leu Cys Asp Ala Glu Ile Val Leu Phe Leu Ala 50 55 60 Ala Arg Leu Ser Pro Ala Leu Ala Glu Arg His Pro Glu Leu Ile Val 65 70 75 80 Ala Glu Arg Arg Leu Leu Val Asp Val Leu Thr Ala Leu Arg His Ser 85 90 95 Ala Pro Phe Pro Val Phe Val Leu Ala Ser Ser Gly Gly Thr Val Tyr 100 105 110 Ser Pro Asn Ala Cys Pro Pro Tyr Asp Glu Ser Ala Leu Thr Arg Pro 115 120 125 Thr Ser Ala Tyr Gly Arg Ala Lys Leu Gly Leu Glu Arg Glu Leu Leu 130 135 140 Gly His Ala Asp His Val Arg Pro Val Ile Leu Arg Leu Ser Asn Val 145 150 155 160 Tyr Gly Pro Gly Gln Arg Pro Ala His Gly Tyr Gly Val Leu Ser His 165 170 175 Trp Leu Asp Ala Ala Ala Arg Arg Gln Pro Ile Arg Val Phe Gly Asp 180 185 190 Pro Glu Val Val Arg Asp Tyr Val His Val Asp Asp Val Ala Glu Ile 195 200 205 Leu Lys Ala Val His Arg Arg Thr Val Thr Thr Gly Pro Glu Gly Ile 210 215 220 Pro Thr Val Leu Asn Val Gly Ser Gly Ala Pro Thr Ser Leu Ala Asp 225 230 235 240 Leu Leu Ala Val Val Ser Thr Val Val Asp Gln Arg Ile Glu Val Ile 245 250 255 Trp Glu Gly Gly Arg Gln Phe Asp Arg Gly Gly Asn Trp Leu Asp Ser 260 265 270 Ser Leu Ala His Glu Thr Leu Gly Trp Arg Ala Arg Ile Gly Leu Thr 275 280 285 Asp Gly Val Arg Glu Cys Trp Glu His Val Leu Ala His Gln Thr Ala 290 295 300 Ala Glu Arg 305 <210> SEQ ID NO 41 <211> LENGTH: 295 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 41 Met Glu Thr Lys Lys Lys Gly Pro Ser Asp Tyr Pro Met Val Val Ala 1 5 10 15 Ile Cys Ala Phe Ser Val Glu Asn Val Arg Lys His Leu His His Asn 20 25 30 Leu Ala Gln Leu Ser Gly Asp Glu Tyr Phe Val Gln Leu Asp Arg Pro 35 40 45 Ser Thr Pro Glu Ala Glu Ser Val Ala Ala Glu Val Asp Ala Ala Gly 50 55 60 Gly Thr Met Arg Val Leu Gly Ala Thr Gly Gly Leu Ser Ala Ser Arg 65 70 75 80 Asn Leu Met Leu Ala Arg Trp Pro Asn His His Val Met Phe Val Asp 85 90 95 Asp Asp Val Arg Leu Asp Ala Lys Ala Val Thr Ala Val Arg Glu Ser 100 105 110 Leu Arg Ala Gly Thr His Val Val Gly Thr Arg Leu Ala Arg Pro Pro 115 120 125 Arg Pro Leu Pro Trp Tyr Val Thr Ser Gly Gln Phe His Leu Leu Gly 130 135 140 Trp His Arg Asp Asp Arg Glu Ile Lys Ile Trp Gly Ala Cys Met Ala 145 150 155 160 Val Asp Thr Ala Phe Ala His Ala Lys Gly Leu Asp Phe Asp Leu Ala 165 170 175 Leu Ser Arg Thr Gly Gly Asn Leu Gln Ser Gly Glu Asp Thr Thr Phe 180 185 190 Val Lys Leu Met Lys Asp Ala Gly Ala Arg Glu Gln Leu Leu Pro Asp 195 200 205 Cys Ser Val Thr His Asp Val Asp Pro Ala Arg Leu Thr Leu Arg Tyr 210 215 220 Leu Leu Arg Arg Ala Tyr Trp Gln Gly Arg Cys Glu Ser Arg Arg Gly 225 230 235 240 Gln Ala Trp Ala Gly Phe Arg Lys Glu Trp Asp Arg His Arg Thr Ala 245 250 255 Pro Glu Ser Arg Leu Arg Leu Pro Leu Ala Cys Leu Tyr Gly Ala Val 260 265 270 Thr Ala Trp Gly Val Leu His Asp Gln Leu Leu Leu Arg Trp Gly Arg 275 280 285 Asp His Arg Ser Ala Ala Val 290 295 <210> SEQ ID NO 42 <211> LENGTH: 341 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 42 Val Ser Glu Val Pro Gln Arg Ala Val Val Asn Asp Pro Ile Arg Phe 1 5 10 15 Gly Val Leu Gly Cys Ala Asp Ile Ala Arg Arg Arg Thr Leu Pro Ala 20 25 30 Ile Leu Arg Glu Pro Thr Thr Glu Leu Val Ala Val Ala Ala Arg Glu 35 40 45 Gln Ala Lys Ala Arg Ala Phe Ala Gly Gln Phe Gly Cys Ala Ala Ala 50 55 60 Ala Asp Tyr Arg Ser Leu Leu Ser Arg Pro Asp Val Asp Ala Val Tyr 65 70 75 80 Ile Pro Leu Pro Ala Gly Leu His His Glu Trp Val Gly Arg Ala Leu 85 90 95 Glu Ala Gly Lys His Val Leu Val Glu Lys Pro Leu Thr Thr Arg Tyr 100 105 110 Ala His Thr Val Ala Met Val Glu Thr Ala Arg Ser Leu Gly Leu Ala 115 120 125 Leu Met Glu Asn Leu Thr Phe Leu Arg His Gly Leu His Ser Ala Val 130 135 140 Leu Asp Leu Leu Glu Gly Gly Glu Ile Gly Glu Leu Arg Glu Val Asn 145 150 155 160 Ala Ala Phe Gly Phe Pro Pro Leu Gly Pro Asp Asp Ile Arg Tyr Arg 165 170 175 Pro Asp Leu Gly Gly Gly Ala Leu Leu Asp Thr Gly Met Tyr Pro Leu 180 185 190 Ser Ala Ala Arg Leu Phe Leu Gly Pro Glu Leu Glu Val Val Gly Ala 195 200 205 Thr Leu Lys Leu Asp Pro Glu Arg Gly Val Asp Val Ser Gly Ser Ala 210 215 220 Leu Leu Ser Thr Pro Asp Gly Arg Thr Ala Gln Leu Thr Phe Gly Phe 225 230 235 240 Glu His Ala Tyr Arg Ser Asp Tyr Val Leu Trp Gly Ser Glu Gly Arg 245 250 255 Ile Val Val Glu Arg Ala Tyr Thr Pro Pro Thr Thr Arg Arg Pro Val 260 265 270 Ile Arg Leu Glu His Glu Glu Val Arg Arg Glu Leu Thr Leu Pro Pro 275 280 285 Glu Asp Gln Phe Ala Asn Thr Leu Arg Ala Phe Val Gly Ala Val Ala 290 295 300 Val Ala Gly Gln Arg Ala Ala Glu Thr Val Glu Ile Cys Ala Arg Ala 305 310 315 320 Arg Leu Ile Asp Glu Ile Arg Val Lys Ser Gly Gln Pro Arg Val Arg 325 330 335 Leu Pro His Glu Val 340 <210> SEQ ID NO 43 <211> LENGTH: 470 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 43 Val Ser Glu Leu Pro Ala Leu Ser Glu Gly Lys Leu Met Thr Ser Gly 1 5 10 15 Asp Asp Pro Met Glu Trp Trp Glu Pro Ala Ala Gly Gly Val Ser Pro 20 25 30 Ala Phe Arg Ser Trp Leu Ala Glu Arg Ser Thr Met Thr Ser Cys Glu 35 40 45 Val Glu Arg Ile Arg Leu Asp Glu Leu Arg Gly Trp Ala Phe Asp Glu 50 55 60 Thr Thr Gly Asn Leu Ala His Glu Ser Gly Arg Phe Phe Val Val Glu 65 70 75 80 Gly Val His Val Arg Thr Thr Tyr Gly Ala Val Ala Glu Trp Tyr Gln 85 90 95 Pro Ile Ile Asn Gln Pro Glu Ile Gly Ile Leu Gly Met Leu Met Lys 100 105 110 Leu Val Asp Gly Val Pro His Cys Leu Leu Gln Ala Lys Val Glu Pro 115 120 125 Gly Asn Ile Asn Met Met Gln Leu Ser Pro Thr Val Gln Ala Thr Arg 130 135 140 Ser Asn Tyr Thr Arg Val His Gly Gly Gly Gly Thr Arg Tyr Leu Glu 145 150 155 160 Tyr Phe Thr Arg Pro Gly Ala Gly Arg Val Leu Val Asp Val Leu Gln 165 170 175 Ser Glu Gln Gly Ser Trp Phe Leu His Lys Arg Asn Arg Asn Met Val 180 185 190 Val Leu Val Asp Asp Val Pro Pro Ser Asp Tyr His Tyr Trp Leu Pro 195 200 205 Leu His Glu Val Arg Arg Leu Leu Arg Ile Asp Val Leu Val Asn Met 210 215 220 Asp Thr Arg Ser Val Leu Ser Cys Leu Pro Ser Thr Phe Phe Ala Gly 225 230 235 240 Ser Gly Val Val Pro Ala Asn Ser Ala Met Ala Ala Ala Leu Ala Arg 245 250 255 Ser Ala Ser Gly Glu Gly Pro Ser His Arg Ser Thr Glu Ala Val Leu 260 265 270 Ser Trp Phe Thr Glu Ala Lys Ser Arg His Glu Leu Ala Val Ser Arg 275 280 285 Ile Ser Leu Arg Asp Leu Arg Gly Trp Arg His Thr Pro Tyr Glu Ile 290 295 300 Ser Arg Glu Asp Gly Arg His Phe Ser Ile Val Gly Val Thr Val Arg 305 310 315 320 Ile Asn Asn Arg Glu Val Thr Glu Trp Ser Gln Pro Leu Leu His Pro 325 330 335 Arg Gln Arg Gly Ile Ile Ala Phe Ala Leu Arg Ile Ile Asn Gly Val 340 345 350 Ala His Val Leu Val His Ala Arg Phe Gln Val Gly Leu Leu Asp Ala 355 360 365 Met Glu Met Gly Pro Thr Val Gln Cys Thr Pro Ser Ser Asp Ala Glu 370 375 380 Gln Arg Pro Pro Phe Leu Asp Leu Ile Leu Asn Ala Pro Ser Glu Arg 385 390 395 400 Ile Leu Phe Asp Thr Val Leu Ala Glu Glu Gly Gly Arg Phe Tyr Arg 405 410 415 Ala Glu Asn Arg Tyr Met Leu Val Glu Val Gly Asp Asp Leu Pro Ala 420 425 430 Val Pro Asp Thr Phe Cys Trp Val Ala Val His Gln Leu Ala Thr Leu 435 440 445 Leu Arg His Gly Tyr Tyr Leu Asn Val Glu Ala Arg Ser Leu Leu Ala 450 455 460 Cys Leu His Ser Leu Trp 465 470 <210> SEQ ID NO 44 <211> LENGTH: 314 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 44 Val Arg Gly Gln Asn Ser Pro Ser Ala Arg Cys Gly Arg Gln Val His 1 5 10 15 Pro Asp Val Arg Leu Val Asn Gly Asp Leu Met Asp Gln Ser Ser Leu 20 25 30 Ile Ser Ala Val Asp Arg Val Arg Pro Asp Glu Ile Tyr Asn Leu Gly 35 40 45 Ala Leu Ser Tyr Val Pro Thr Ser Trp Arg Gln Pro Asn Thr Thr Ala 50 55 60 Glu Ile Thr Gly Thr Gly Val Val Arg Met Leu Glu Ala Val Arg Ile 65 70 75 80 Val Ala Gly Ile Thr Ser Ser Arg Thr Pro Gly Pro Ser Arg Pro Arg 85 90 95 Phe Tyr Gln Ala Ser Ser Ser Glu Met Phe Gly Lys Val Arg Glu Thr 100 105 110 Pro Gln Asn Glu Leu Thr Pro Phe His Pro Arg Ser Pro Tyr Gly Val 115 120 125 Ala Lys Val Phe Gly His Tyr Thr Val Gln Asn Tyr Arg Glu Ser Tyr 130 135 140 Gly Met Tyr Ala Val Ser Gly Met Leu Phe Asn His Glu Ser Pro Ile 145 150 155 160 Arg Gly Pro Glu Phe Val Thr Arg Lys Val Ser Leu Gly Ala Ala Ala 165 170 175 Val Lys Leu Gly Leu Arg Asp Ser Leu Arg Leu Gly Asn Leu Met Ala 180 185 190 Glu Arg Asp Trp Gly Phe Ala Gly Asp Tyr Val Arg Gly Met Ser Met 195 200 205 Met Leu Ala Gln Asp Glu Pro Asp Asp Tyr Val Leu Gly Thr Gly Ile 210 215 220 Ala His Ser Val Arg Glu Leu Val Glu Leu Ala Phe Ala His Val Asp 225 230 235 240 Leu Asp Trp Arg Asp His Val Val Leu Asp Glu Ala Leu Gln Arg Pro 245 250 255 Ala Glu Val Asp Leu Leu Cys Ala Asp Ala Thr Lys Ala Gln Gln Arg 260 265 270 Leu Gly Trp Lys Pro Thr Val Tyr Phe Glu Glu Leu Val Gly Met Met 275 280 285 Val Asp Ser Asp Leu Arg Leu Leu Ser Gly Pro Gln Cys Ser Glu Gly 290 295 300 Gly Val Ile Arg Glu Leu Ala Asp Leu Trp 305 310 <210> SEQ ID NO 45 <211> LENGTH: 277 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 45 Val Arg Ser Gly Ala Ala Ala Pro Ala Thr Pro Ile Ala Thr Leu Ala 1 5 10 15 Cys Gly Glu Ile Gly Ala Gly Trp Ala Ile Gly Tyr Glu Thr Thr Arg 20 25 30 Asp Asp Tyr Gly Asp Leu Ala Ser Gly Ala Val Leu Arg Ser Ala Pro 35 40 45 Gly Phe Pro Ala Phe Pro Val Arg Leu Ala Ser Glu Val Phe Gln His 50 55 60 Ala Leu Asp Val Arg Gly Gly Thr Asp Pro Ala Val Leu Trp Asp Pro 65 70 75 80 Cys Cys Gly Ser Gly Tyr Leu Leu Thr Val Leu Gly Ile Leu His Arg 85 90 95 Arg Ser Ile Ala Arg Leu Leu Ala Ser Asp Ile Asp Ala Asp Ala Ala 100 105 110 Leu Thr Leu Ala Ala Ala Asn Val Gly Leu Leu Ala Glu Gly Gly Leu 115 120 125 Ala Ala Arg Ala Ala Glu Leu Thr Glu Arg Ala Arg Arg Phe Asp Lys 130 135 140 Pro Gly Tyr Ala Glu Ala Ala Ala Ala Ala Gly Arg Leu Ser Arg Arg 145 150 155 160 Leu Glu Ala Ala Gly Gly Pro Leu Pro Tyr Ala Val Arg Gln Ala Asp 165 170 175 Val Phe Asp Pro Thr Ala Leu Ala Ser Ala Val Gly Gly Ser Glu Pro 180 185 190 Asp Leu Val Val Thr Asp Val Pro Tyr Gly Glu Gln Thr Glu Trp Thr 195 200 205 Gly Gly His Ala Gly Arg Gly Leu His Gly Met Leu Thr Ala Val Ala 210 215 220 Ser Val Leu Arg Pro Ala Ala Val Ile Ala Val Ile Val Arg Gly Arg 225 230 235 240 Arg Val Pro Pro Ile Gly Ala Ala Arg Pro Arg Arg Lys Leu Arg Val 245 250 255 Gly Thr Arg Ala Ala Ala Leu Phe Val Ala Ala Asp Leu Asn Asp Ala 260 265 270 Gly Pro Ala His Pro 275 <210> SEQ ID NO 46 <211. LENGTH: 49 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 46 Ala Arg Leu Gly Glu Leu Gln His Ser Ala Leu Asp Val Thr Arg Ala 1 5 10 15 Gly Arg Glu Leu Asn Trp Thr Ala Arg Thr Ala Leu Ala Glu Gly Ile 20 25 30 Ala Arg Ala Tyr Gln Trp Ile Arg Asp Gln Asp Gly Cys Asp Val Glu 35 40 45 Ala <210> SEQ ID NO 47 <211> LENGTH: 824 <212> TYPE: DNA <213> ORGANISM: M. carbonacea <220> FEATURE: <221> NAME/KEY: misc_feature <222> LOCATION: (7)..(480) <223> OTHER INFORMATION: ORF 40 (negative strandedness) incomplete: N-terminus only (C-terminus is on preceding DNA cont ig <400> SEQUENCE: 47 gccgtacagc cggttgtaga gggccacgta ctgctccgcg cagtacttgg cggcgccgta 60 cggcgccgca ggctccgggc gggcgtcctc gggggacggg atcgcgctga tcgccccgta 120 cagggctccg ccggtggagg cgaacaccac ccgggccccg acggctcggg ccgccttcag 180 gacgttgacg gtgccgagca cgttgacccc ggtgtcgccg ctggcatccg cgaccgaggt 240 gcggacgtcg gcctgcgcgg cgaggtggta gatcaggtcc ggacgggcgt ccgccacgat 300 cgcggcgaga gccttcccgt cggtgatgga ctcctgatgg aaggcgacac ggacggccaa 360 ccggccgcac cggccggtgg agaggtcgtc gaccacggtg acggtgtcgc cgcgctccag 420 cagggcgtcg accaggtgtg agccgatgaa gccggcgcca cctgtcacga ggacgcgcat 480 ggacggggat ccgtggcgga agaaggaatt gacttcgttg gccctgcgat aaacagtatc 540 ttcacgaggc cctccgtgtg tgtccgccga atgtatatgg gaacggctcg ccggcacagg 600 ccggaaacgg ccccgcattg aagctcgagt gatacgccta gacttcaccg ccaccggcta 660 ctggagggcc tacgctaacc ggtgtccaca cattcgcggg ccgcatgtgc gttggcgtcg 720 ttcccgaccg tcagccatgc aatggtggtt tcggtcgtgg gtaggcgacc agggtcggaa 780 tagtgcaaaa ggaagcgggc gatggctaca gacacagcga attc 824 <210> SEQ ID NO 48 <211> LENGTH: 159 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 48 Met Arg Val Leu Val Thr Gly Gly Ala Gly Phe Ile Gly Ser His Leu 1 5 10 15 Val Asp Ala Leu Leu Glu Arg Gly Asp Thr Val Thr Val Val Asp Asp 20 25 30 Leu Ser Thr Gly Arg Cys Gly Arg Leu Ala Val Arg Val Ala Phe His 35 40 45 Gln Glu Ser Ile Thr Asp Gly Lys Ala Leu Ala Ala Ile Val Ala Asp 50 55 60 Ala Arg Pro Asp Leu Ile Tyr His Leu Ala Ala Gln Ala Asp Val Arg 65 70 75 80 Thr Ser Val Ala Asp Ala Ser Gly Asp Thr Gly Val Asn Val Leu Gly 85 90 95 Thr Val Asn Val Leu Lys Ala Ala Arg Ala Val Gly Ala Arg Val Val 100 105 110 Phe Ala Ser Thr Gly Gly Ala Leu Tyr Gly Ala Ile Ser Ala Ile Pro 115 120 125 Ser Pro Glu Asp Ala Arg Pro Glu Pro Ala Ala Pro Tyr Gly Ala Ala 130 135 140 Lys Tyr Cys Ala Glu Gln Tyr Val Ala Leu Tyr Asn Arg Leu Tyr 145 150 155 <210> SEQ ID NO 49 <211> LENGTH: 11115 <212> TYPE: DNA <213> ORGANISM: M. carbonacea <220> FEATURE: <221> NAME/KEY: misc_feature <222> LOCATION: (8)..(1207) <223> OTHER INFORMATION: ORF 41 (positive strandedness) incomplete: C-terminus only <221> NAME/KEY: misc_feature <222> LOCATION: (1213)..(2331) <223> OTHER INFORMATION: ORF 42 (positive strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (2364)..(3611) <223> OTHER INFORMATION: ORF 43 (positive strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (3623)..(4243) <223> OTHER INFORMATION: ORF 44 (positive strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (4149)..(5177) <223> OTHER INFORMATION: ORF 45 (positive strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (5177)..(6094) <223> OTHER INFORMATION: ORF 46 (negative strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (6271)..(7824) <223> OTHER INFORMATION: ORF 47 (negative strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (7903)..(8760) <223> OTHER INFORMATION: ORF 48 (negative strandedness) <221> NAME/KEY: misc_feature <222> LOCATION: (8781)..(9800) <223> OTHER INFORMATION: ORF 49 (negative strandedness) <400> SEQUENCE: 49 ccgcaccatg gtcgacctgc tgaccggcgt actcccgcag atccggtcgg aggccggtga 60 caacgaccgg gacggcacgt tcccggtcga ggtgttcggg cagttggcca agctcggcct 120 gatgggcgcg accgtgccca ccgcgctcgg cgggctcggc gtccaccgcc tgtacgacgt 180 cgccgtcgcc ctgatgcgcc tggccgaagc ggacgcctcc accgccctgg cactgcacgt 240 ccagctcagc cgcgggctca ccctgaccta cgaatggatg cacggctccc cgccggtgcg 300 ggcgctggcc gagcggctgc tgcgggcgat ggcgacgggg gaggccgccg tctgcggggc 360 actgaaggac gcgccgggcg tcctcaccga actgaccgcc gatggttccg gcggctggct 420 gctcaacggc cgcaagatcc tggtcagcat ggcgccgatc ggtacccact tcttcgtgca 480 cgcccagcgc cgggacgccg acggcaacgt ggtgctggcc gttccggtgg tgcggcgcga 540 cgcgcccggg ctgaccgtcg gcacgcactg ggacggcctc gggatgcggg cctccggcac 600 cctcgacgtc agcttccacg actgcccggt cgccgccgac cacgttctcg accgcgggcc 660 ggccggcgcg cgccgggacg ccgtcctggc cgggcagacg gtcagctcga tcaccatgct 720 cgggatctac gccggtgtcg cgcaggccgc gcgggacctc gccgtcgaga cgtacgcgcg 780 tcgtcgatcg cggccggcgg ccgccgccct cgccctggtg gccggcatcg acacgcggct 840 gtacacgctc cgggccgtcg ccggcgccgc gctgctcaac gcggacctcc tggccgcgga 900 cctgaccggc gatctcgacg agcgcgggcg cgggatgatg accccgttcc agtacgcgaa 960 gatgaccgtc aacgaactgg ccccggcggt cgtcgacgac tgcctctcgc tgctcggcgg 1020 ccaggcgtac gacgggcagc acccgttggc acggctctac cgcgacgtcc gggccggtgg 1080 gttcatgcag ccctacagct atgtggatgg cgtcgactac ctgagcggcc aggcgctggg 1140 cgcggaccgg gacaacgact acatgagcgt tcgggcgctc cgctccccgg atccggcggg 1200 agaaaggtga acatgaccat ccgagtgtgg gactacctgc cggaatacga gaaggaacgg 1260 gccgacctgc tcgacgcggt ggagacggtc ttcgagtcgg gcaacctcgt gctcggccgc 1320 agcgtgctcg gcttcgagac cgagttcgcc gcgtaccacg acgtggcgca ctgcgtcacg 1380 gtggacaacg gcaccaacgc gatcaagctg gccctgcagg cgttgggcgt ggggcccggc 1440 gacgaggtgg tcaccgtcgc caacacggcg gcgccgaccg tgctggcgat cgacgccgtc 1500 ggcgcgatcc cggtcttcgt ggacatccgg ccggacgact acctgatgga cacgacccag 1560 gtggccgacg tgatcacccc ggcgaccaag gctctgctgc ccgtccacct ctacggccag 1620 tgcgtggaga tggcgccgtt gcagcggctg gcccgcgagc acgggctgct ggtgctggag 1680 gactgcgcgc agtcgcacgg cgcacgacac gcagggcaac tcgccgggac catgggcgac 1740 gcggcggcct tctccttcta tccgacgaag gtgctgggcg cctacggtga cggcggcgcc 1800 gtgctcaccg gtagtgagac cgtggaccgt gacctgcgcc aactgcgcta ctacggcatg 1860 gagagcgtgt actacgtcgt gcagacgccc ggccacaaca gccggctgga cgaggtgcag 1920 gcggagattc tccggcgcaa gctgcgccgg ctcgacgagt acatcgccgg ccgccgcgcg 1980 gtggccgagc gctacgccgc cgggctgggc gacatcgccg aggcgaccgg gctcgtcctg 2040 cccgccctcg ccgacgccaa cgaacacgtc ttctacctct acgtcgtccg tcatccgcag 2100 cgggacgcga tcctggagca actgaagcgg cgtggaatca cgctgaacat cagttacccg 2160 tggccggtgc acaccatgac cggcttctcg aagctcggct atgccgccgg atcgctgccg 2220 gtcaccgagc ggatcgccga cgagatcttc tccctgccca tgtatccgtc cctgccggtc 2280 gacgtgcagg acacggtgat aggcgcattg cgcgacgtac tcacgacgct ctgagccgcc 2340 ggtagcactg gaggacgcca cccatgatca gcccagccga ccgggcacgg ccacgagcca 2400 cctgccgcgc ctgcggtgga accgtcgtgc agttcctcga cctcggccgc cagccactgt 2460 ccgaccgctt cctgaccgaa ccggagatcc cgcaggagta cttcttccag ctcgccgtcg 2520 gcctctgcga gacgtgcacg atggtgcagc tcatgcagga ggtcccccgg gagcggatgt 2580 tccacgagga ctacccgtac tactcgtccg gttccgccgt catgcagaag cactttgccg 2640 acaccgcccg gcaactgttg gagacggagg ccaccggccc ggacccgttc gtggtcgaga 2700 tcggctgcaa cgacggggtg atgctgcgga ccgtgcacga ggccggcgtc cggcacctgg 2760 gcttcgaacc gtcgggcaag gtcgccgaag cggcaagggc caagggcctt cgggtacgcg 2820 gggacttctt cgaggagtcc accgcccgtg aggtacgcgc gagcgacggc cccgcagatg 2880 tgatcttcgc ggcgaacacc atctgccaca tcccgtacct cgactcgatc ctgcggggtg 2940 tcgacgcgct gctcgggccg gacggcgtct tcgtcttcga ggacccctac ctgggcgaca 3000 tcctggcgaa gacgtcgttc gaccagatct acgacgagca cttcttcctg ttctcggcgc 3060 gctccgtgca ggcgttggcc gcgtcgttcg ggttcgagct ggtcgacgtg gaccggctgg 3120 ccgtgcacgg cggcgaggtc cgctacacgc tggcccgtgc gggtgcacgc cgcccggcgg 3180 accgggtggc cgcgctgatc gccgaggagg acgcgggcgg cgtcgcgacg ctggcccggc 3240 tggaccagtt cgctgcccag gtcggccgga tccgcgacga cctgcgggcg ttgctcgaac 3300 ggttgacggc ggagggcaaa cgggtggtgg cctacggggc gaccgccaag agcgcgaccg 3360 tggcgaactt ctgcggcatc gggccggacc tggtgtcgcg ggtgtacgac acgacgcccg 3420 ccaaacaggg ccgcctgacc ccgggcacgc acatcccggt tcatgcggcg gacgagttcc 3480 cgaccgaccc gccggactac gcgctgctct tcgcgtggaa ccacgccgac gagatcatgg 3540 cgaaggagca ggcgttccgg caggccggcg gggcctggat cctgtacgtt ccgcacgttc 3600 acgtgcggga ttgagtgggg ccgtgcaggt agcaaccgaa ctcgccgtcg agggcgcgta 3660 cgtcttcacc ccgcgggtct ttcccgaccc gcggggggtc ttcgtgtccc cgtacctgga 3720 ctcggtcttc accgagacgc tcggatatcc gttgtttccc gtggcgcaga ccagctacag 3780 cgtctcccgc cgcggcgtcg tccgcgggct gcactacacc acgacgccgc ccggttcggc 3840 caagttcgtc tcgtgcccgt acggccgggt cctcgacgtg gtgctcgacg tccgggtcgg 3900 atcgccgacc ttcgggcgct gggacagcgt ggtcctcgac tcccagggct tcaggtcgct 3960 gtacctgccg acgggggtgg cgcacatgtt cgtcgccctg atggacgaca cggtgatgtc 4020 ttacctgctc tccacggagt acgtcttcga gaacgaacgg gcgttgtcac cgctcgacga 4080 cacgctcggc ctgcccgttc ccgccgacat cgagccgatc ctgtcggatc gggaccggac 4140 cgcgatcacc ttcgcccagg cccacgcggc cggggtgctc ccccggtacg agatctgcgc 4200 cgagatcgag gcgcgtttct gctcagggac cgcaccgtac ggcgtagacc gtgcagaaga 4260 tccacctggg cgcgccatcc ggtcacggcg gtgaaggcgg atgcgtcgac caccagactg 4320 tggaagtcgg tctgccgggc actggcgggc ggcgcgacgg agacgaccgg caccggcgga 4380 cgccccgtct cctccgcgac cagcgcggcg atcgtccgga aaaggtcacc gaccggttcg 4440 ccgcggcggc tgccgagcgg ccagtgccgg cagacgagcg catcggcatg gtcgagggcg 4500 gcgacgaagg cgctcgccgc gtcgtccacg tagagcagct ctcgctggat ggtgccgtcg 4560 tgccacatgg tcaggggttc gccggcgagc gcgcggcgga tcatcgtcga cacgaccccg 4620 cggtcgtcgc cgccacccgg gcgggccggc ccgaagacgg tgggcaggcg cagcgtgacc 4680 ccgcggagga tgccatcggc ggaggcccgg tcgagcagcg cttcggcggc cgccttctgc 4740 cggtcgtatc cggtctccgg gtggtcgggt tcggtcccgt cgacgggcat ccgctgcgcc 4800 cggccgacct gtgacgccga gccggcgaag acgaccaccc gtggcccggt cccggcccgc 4860 gcaacctcga ccaggtcacg gacgacgccg aggttcaccc gcgccgcggt gccgtcgccg 4920 tcggcgctgc gccacccggc ggtgttgagc accaggttga tcacggcatc cgcgccctcg 4980 accgccgccg cgaccgcgcc tgtctcggtg aggtcggcgg tgaccacctc gaaatccgcg 5040 gcggccggct ccggtgccac agcggatcgg cgggacaccg cccggacggt gactgggcgg 5100 tcggccagcg cggtcaggac ggccgagccc acgaaaccgg acgcgccgag caccgcgatc 5160 agcgggcggt ccgtcatcgc ccggcagctc cggaccggta cagcgcctgc cagtagaagc 5220 tccacgggac ggctctcgca tctgcgagca gcgcccagtc gcggcccggc cggtaggcgt 5280 cgatgaaccg gcgggactgc gccgcgacgg cgcggtcgtc gaagatgtcc acgatgtccc 5340 tcagcaggcc ggtccgcaac gcgagctgga cgaccgcgtc gccctgcgag tgcccgtcca 5400 ggctcttgtc gaggtacttg ccgaccgggt tgttgacgta gaggtggtcg gcatgggtgg 5460 cgatgaggtc caggtaggcg ccgacggtgt ccggggtcat ctccgcgaac gagtcgatgt 5520 tgatggccag gtcgaaccgg agctcccgca gggcgccgcc ggcctccgcc tggtcgacgc 5580 cgtgaaagtg caccttggca agctgctcgt cggtcagcac cgcgccgagg tagcggctgg 5640 ccaggtcgag cgagttctcc agatcgacga tgtggtacgc ggcgatctcg tggttggaca 5700 gcagcgcgtg gcaggtccgc ccgtagccgg cgccgatctc caggatgctc gtaccgtcga 5760 gggtcatccg gctctcgatg aactccacct cgagcaccgc ctgcaggtag tccatgcaga 5820 ccgcttcgcc gtcgtaggtg atcgagaacg gatcgccgac ctcgcggttg gcgatgcggc 5880 gcagccgggc ccagttggcc gggctcaggc ccgccgcgag ggtgaagacg agcgttttca 5940 gatagcgcac accattgacc cgcgggtccc agagcgccag cttgtagttg acctcgctgg 6000 acttgaagtt gctcaggtcg ccgacggcct ccctggtgac ctgggtgttg ttgtagagct 6060 cccagagcgg gctgcggccg tacgtctggc tcatgtgccc cccccgcgcc gatcgaatca 6120 ctcgggatgg tgaccgtacc ggctatttac tagcggttcg cctagagcca ccgttcaaga 6180 tcacggtgac aggggctcgc ctaccccgcg cgtcgccggc cgtacgcccc cactcgcggg 6240 gcgcaccggc cggcgacgct gtcgtcgtta cggggtcacg gagagcaggt gggccttgta 6300 gccctcaccg taggtgctgg tcggggtgcc gttgtagtcc tggatgagga cgttgccgcc 6360 gctgcatccc cacgggttcc aggtcacgcc gtgtagccga tccgcttgga gtccagccag 6420 agtcatcacc tggtcgatgt agtcgtgggc gcaggtgtcc tggccgatct cgccggcgtg 6480 caccgggcac ctgcggcggc gacggcgccg atctggctgt cccagcagga ggcgggtgac 6540 gcaggcgttg aagttgtacg agtgccacga cgccacgatg ttgccgagcg ggtcgttcgg 6600 cttgtaggtg agccactggc tcaggtcgtt ggtccaggtc aggccggcga ccagcaggac 6660 gttgctggca ccggtggccc ggacggcgtc gaccaggtcc tgcatgccgg cgacctcgta 6720 ggtgatgccg gtgcaggtgc cgccgtcgcg caggcagcgc cacgcggcag ccatgtccga 6780 ccagttgttg gcggcgtccg ggtagggctc gttgaacagg tcgaacacca cggcgtcgtt 6840 gcccttgaag gcgttggcga cgccggtcca gaactgcgga gtgtgctgca tgctgggcat 6900 cggcttctgg caggtggcgt tgacgtcggc gcaggcggag atgttgccgg tgtactgccc 6960 gtgggtccag tgcaggtcga ggatcgggtt gatcccgttg gccacgagca ggttcacgta 7020 gtccttgacg gcctgctggt acgtcgcgcc gctgggcgag ccggagaggc cgagccagca 7080 gtcctcgttg agcgggatcc gcacggcgcg gatgttccac gccttcatgg cgttgaccga 7140 ggcctggtcg acggggccgc tgtcccacat gcccttgccc tgcacgcagg cgaactcacc 7200 gctggcccgg ttgactccga gcagccggta ggtcgccccg ctcgccgtca ccagccggtt 7260 gccggagacc ttcagcgcgg gcgcggcccc ggtcggcggc ggggtcgtgg gtgggggagt 7320 ggtgggcggt ggggtcgtgg gcgggggagt ggtgggcggc ggggtcgtcg gcggcggggt 7380 ggtggtcggc tccggggtcg gcgaggtcac cgagccggtg caggtcgtgc cgttgagcgc 7440 gaacgacttc ggcacggggt tgctgccgct ccacgagccg ttgaagccga tcgtggtcga 7500 tccgcccgtg cccagcgatc cgttccagct caggctggcg gccgagacgc tcgtgccgga 7560 ctgcgaccag gtggcgctcc agccctgggt gacctgctgg ccgctggtcg ggaagtcgaa 7620 ggtcagcgtc cagccggtga gggcggagcc gaggttggtg atggcgacgt tcccgctgaa 7680 cccgcctgtc cactggctct gcacggtgta tgccacggag cagccggtgg ccgcggccga 7740 ggcggggaag gtgagcccgg ccaggccggc ggcgaccagg gtcgcggtgg tgccgacggc 7800 cagcagggca tgacgatgtc tcatctgatc tcctcgtggt cgagagggga tcgtccgatg 7860 ggagcgcatc gaagagcttt gtttatttac ctcactaagt caagctgacg tccggccctg 7920 cttcccggcc ggcgcggggc cggtggtgtg ccgggcgatc accgtctcgg tggggcacca 7980 ccgttcccgg accggctcgc cgtcgagcag ggcgagcacg gaggcggcca ccagcgcgcc 8040 gaactcgtgc acgtcgaggc tcatcgtggt gagctgcggg gaggacaggc ggcacaggct 8100 ggagtcgtcc caggcgagca tgctcaggtc gcgggggacc gccaggccca gctccctggc 8160 cacctccagg ccgccgaccg ccatcaggtc gttgtcgtag atgatcgcgc tgggcgggtc 8220 gccgtcgcgc aacagccgga cggtcgccgc cgcgcccgac tcctccgagt agtcgccggt 8280 caggaccacg gcgtcgatgc cggccggggc ggcggcggcc agcaacgcgg ccgtgcgggt 8340 gcgggtgtgc cgcaggctgt cgggcccgct gatccgcgcg atccggcggt gcccgaggcc 8400 ggccaggtgc gcgaccgccg cccgtaccga gccgacgtcg tcgcgccgca cggccggggt 8460 gtcgccggct ggctcgccgg ccacgaccac ggggaggccc aggtcgcgca ggaccgccgg 8520 ccgggggtcc gcggcggtcg ggttcaccag caccacggcc tcggccagcc ggagctgtgc 8580 ccaccggcgg taggcggcga tctcggcggc gtggtcggcg acgatgtgca gcagcaccga 8640 ccggccgtgc tcggcgagac gttcctcgat gccggagatg aactccatga agaacggctc 8700 ggcgccgagc aggcggggcg cccgggcgag caccaggccg accgcgctcg tcgtgctcat 8760 tccgccccca tcacaggtca gcgggccgat ccgggcagcg gcgcgaactc cccgtcgagg 8820 ctctccgaca cccggagccc gcggtggaac gggaccagct cggactccag gaccagggcg 8880 gcggccccga cggcgacgcg gtggcggcgg accgggacag ccgtacgtcg aacggggtgc 8940 gcggagcggg agaacaccgt gtgcagctcc tggcggagca cgggcaggta gaccgagccg 9000 gcgacggcga agcccggccc ggtcagcacg atgacctcca ggtccatcac gttggccagg 9060 gtgcgggcgg ccgccgcgac gtaccgcgcg gacctctcgc acagcgccag ggcccgctgc 9120 tccccgcgcc gggcggcgcg gccgatcgcg gcgaagtcgc gggccaccgc cgcggggccg 9180 gcgccggtcg tgaggccgag cgcccgggcc aggccggcgt ccgcccgccc cgccgcgacc 9240 acggcggcgg gcccggcgac ggcctccacg cacccccgcg cgccgcacca gcagggcggg 9300 ccgtccgcgg ccacgcagac gtgccccagc tcgccggcgt tgccactcgg tccgcgatag 9360 gtgatcccgt cgatgaccag cccggcgccg aggccgctgc ccatgtagag ggcggcggcg 9420 gcgctcgccg tgccgaaccc gcccgcccag tgttcgccca gggcggcggc tgtggcgtcg 9480 ttgtccagca ccaccggcag ctcggtcgcc tgctccagcg ccgcgccgag cgggaactcc 9540 cgccagtgcc gcagctcggg gttcaggccg gcgacccccc ggccggtgag cgggccgggg 9600 aagaccagcc ccaccccgac caaccgggcc cggtccacgc cgacgctgtc gaccagggtc 9660 ggtatctcgg ccgcgatccg ggagacgacc gtcgcgggcg cctcgacgcc gacgccgggc 9720 cgggagatcc gggccaccac gatcccggtc agatcggtca ggacgtacgt catgacgccc 9780 tggccgaggc acacgcccac cgcgtaccgg gaggtgtggt tgagccggag caggacccgc 9840 cgttttccgc cggtcgactc ggctgtggcc cggtctcgac gaccaggccc tcgtcgatga 9900 gcttgccgga cgaggttgga aatggtgggg ccgcagtgaa gccggtcacg ctgatcaggc 9960 ccacccggct gatcgtgccg gctgcccgga tggcgtcagc accgcggcct tgctgctcgc 10020 gtgcggcagc ttgtccgcct cggtccgctc actgctcccc ggtcacgccg tccgaatccc 10080 cagcagagca taggcgttcg ccgctgccgc cgcgcacgcc taccggcggc cggcgcgcgg 10140 ggccggccgc ccgtcccccg tcgggggcga cgctcagcgt cgcagttcgc gccagcccga 10200 gctgcctccg cgccacgccc ccgcgaggat gctcgcggag ctgtggctgg agcattccag 10260 caccttcgag cggccgtgct caccaccgat ctgcgcctgg acctcccagc gctgaccatc 10320 ggtgcggatg tagacgtcgc gacgccctcg tttggttgcg tccccgttcc accagtgttc 10380 ctcgacgatc acctcgcgac ggtaccagca ggcgcgggtg cggcggcagc cgcgacggca 10440 agggaatctc gcacgggccg gccgggcccg ggcccgtgcc ggcccgacgc cgcgactcac 10500 gtgcggcggc tcagcgcgcg gcgcgcagcg cctcggcgag ggtggtcggc tcgcggccga 10560 tcagcttcgc caggtcgtcg ccgtcgacgt acagctcgcc ccgggccagg cccaggtcgc 10620 tgtcggccag gacggcggcg aagccctcgg gcaggccggc ggagaccagc acctcggtga 10680 gcttctccgc cggcaggtcg gtgtagccga cggcctggcc ggtctgccgg gacacctcct 10740 cggccagctc ggtcagggtg aacgccgggc cgccgagctc gtacacccgg ttggtctcgg 10800 cggcgccggt gagcgccgcg gcggcggcct cggcgtagtc cgcgcgggtc gcggcgctga 10860 cccgcccgtc gcccgccgcg ccggtcacac cgaactggag gtacgtcgcg agctggtcgg 10920 tgtagttctc caggtaccac ccgttgcgca ggatcacgta cggcaggccc gacgcggtga 10980 tctcccgctc ggtggcgagg tgctccccgg cgaggatcat gccggagcgg tcggcgttgg 11040 cgatgctggt gtagacgacc agcccgacgc cggcctcgcg ggcggcggcg acgacgttgt 11100 ggtgctgggc gacgc 11115 <210> SEQ ID NO 50 <211> LENGTH: 400 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 50 Met Val Asp Leu Leu Thr Gly Val Leu Pro Gln Ile Arg Ser Glu Ala 1 5 10 15 Gly Asp Asn Asp Arg Asp Gly Thr Phe Pro Val Glu Val Phe Gly Gln 20 25 30 Leu Ala Lys Leu Gly Leu Met Gly Ala Thr Val Pro Thr Ala Leu Gly 35 40 45 Gly Leu Gly Val His Arg Leu Tyr Asp Val Ala Val Ala Leu Met Arg 50 55 60 Leu Ala Glu Ala Asp Ala Ser Thr Ala Leu Ala Leu His Val Gln Leu 65 70 75 80 Ser Arg Gly Leu Thr Leu Thr Tyr Glu Trp Met His Gly Ser Pro Pro 85 90 95 Val Arg Ala Leu Ala Glu Arg Leu Leu Arg Ala Met Ala Thr Gly Glu 100 105 110 Ala Ala Val Cys Gly Ala Leu Lys Asp Ala Pro Gly Val Leu Thr Glu 115 120 125 Leu Thr Ala Asp Gly Ser Gly Gly Trp Leu Leu Asn Gly Arg Lys Ile 130 135 140 Leu Val Ser Met Ala Pro Ile Gly Thr His Phe Phe Val His Ala Gln 145 150 155 160 Arg Arg Asp Ala Asp Gly Asn Val Val Leu Ala Val Pro Val Val Arg 165 170 175 Arg Asp Ala Pro Gly Leu Thr Val Gly Thr His Trp Asp Gly Leu Gly 180 185 190 Met Arg Ala Ser Gly Thr Leu Asp Val Ser Phe His Asp Cys Pro Val 195 200 205 Ala Ala Asp His Val Leu Asp Arg Gly Pro Ala Gly Ala Arg Arg Asp 210 215 220 Ala Val Leu Ala Gly Gln Thr Val Ser Ser Ile Thr Met Leu Gly Ile 225 230 235 240 Tyr Ala Gly Val Ala Gln Ala Ala Arg Asp Leu Ala Val Glu Thr Tyr 245 250 255 Ala Arg Arg Arg Ser Arg Pro Ala Ala Ala Ala Leu Ala Leu Val Ala 260 265 270 Gly Ile Asp Thr Arg Leu Tyr Thr Leu Arg Ala Val Ala Gly Ala Ala 275 280 285 Leu Leu Asn Ala Asp Leu Leu Ala Ala Asp Leu Thr Gly Asp Leu Asp 290 295 300 Glu Arg Gly Arg Gly Met Met Thr Pro Phe Gln Tyr Ala Lys Met Thr 305 310 315 320 Val Asn Glu Leu Ala Pro Ala Val Val Asp Asp Cys Leu Ser Leu Leu 325 330 335 Gly Gly Gln Ala Tyr Asp Gly Gln His Pro Leu Ala Arg Leu Tyr Arg 340 345 350 Asp Val Arg Ala Gly Gly Phe Met Gln Pro Tyr Ser Tyr Val Asp Gly 355 360 365 Val Asp Tyr Leu Ser Gly Gln Ala Leu Gly Ala Asp Arg Asp Asn Asp 370 375 380 Tyr Met Ser Val Arg Ala Leu Arg Ser Pro Asp Pro Ala Gly Glu Arg 385 390 395 400 <210> SEQ ID NO 51 <211> LENGTH: 373 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 51 Met Thr Ile Arg Val Trp Asp Tyr Leu Pro Glu Tyr Glu Lys Glu Arg 1 5 10 15 Ala Asp Leu Leu Asp Ala Val Glu Thr Val Phe Glu Ser Gly Asn Leu 20 25 30 Val Leu Gly Arg Ser Val Leu Gly Phe Glu Thr Glu Phe Ala Ala Tyr 35 40 45 His Asp Val Ala His Cys Val Thr Val Asp Asn Gly Thr Asn Ala Ile 50 55 60 Lys Leu Ala Leu Gln Ala Leu Gly Val Gly Pro Gly Asp Glu Val Val 65 70 75 80 Thr Val Ala Asn Thr Ala Ala Pro Thr Val Leu Ala Ile Asp Ala Val 85 90 95 Gly Ala Ile Pro Val Phe Val Asp Ile Arg Pro Asp Asp Tyr Leu Met 100 105 110 Asp Thr Thr Gln Val Ala Asp Val Ile Thr Pro Ala Thr Lys Ala Leu 115 120 125 Leu Pro Val His Leu Tyr Gly Gln Cys Val Glu Met Ala Pro Leu Gln 130 135 140 Arg Leu Ala Arg Glu His Gly Leu Leu Val Leu Glu Asp Cys Ala Gln 145 150 155 160 Ser His Gly Ala Arg His Ala Gly Gln Leu Ala Gly Thr Met Gly Asp 165 170 175 Ala Ala Ala Phe Ser Phe Tyr Pro Thr Lys Val Leu Gly Ala Tyr Gly 180 185 190 Asp Gly Gly Ala Val Leu Thr Gly Ser Glu Thr Val Asp Arg Asp Leu 195 200 205 Arg Gln Leu Arg Tyr Tyr Gly Met Glu Ser Val Tyr Tyr Val Val Gln 210 215 220 Thr Pro Gly His Asn Ser Arg Leu Asp Glu Val Gln Ala Glu Ile Leu 225 230 235 240 Arg Arg Lys Leu Arg Arg Leu Asp Glu Tyr Ile Ala Gly Arg Arg Ala 245 250 255 Val Ala Glu Arg Tyr Ala Ala Gly Leu Gly Asp Ile Ala Glu Ala Thr 260 265 270 Gly Leu Val Leu Pro Ala Leu Ala Asp Ala Asn Glu His Val Phe Tyr 275 280 285 Leu Tyr Val Val Arg His Pro Gln Arg Asp Ala Ile Leu Glu Gln Leu 290 295 300 Lys Arg Arg Gly Ile Thr Leu Asn Ile Ser Tyr Pro Trp Pro Val His 305 310 315 320 Thr Met Thr Gly Phe Ser Lys Leu Gly Tyr Ala Ala Gly Ser Leu Pro 325 330 335 Val Thr Glu Arg Ile Ala Asp Glu Ile Phe Ser Leu Pro Met Tyr Pro 340 345 350 Ser Leu Pro Val Asp Val Gln Asp Thr Val Ile Gly Ala Leu Arg Asp 355 360 365 Val Leu Thr Thr Leu 370 <210> SEQ ID NO 52 <211> LENGTH: 416 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 52 Met Ile Ser Pro Ala Asp Arg Ala Arg Pro Arg Ala Thr Cys Arg Ala 1 5 10 15 Cys Gly Gly Thr Val Val Gln Phe Leu Asp Leu Gly Arg Gln Pro Leu 20 25 30 Ser Asp Arg Phe Leu Thr Glu Pro Glu Ile Pro Gln Glu Tyr Phe Phe 35 40 45 Gln Leu Ala Val Gly Leu Cys Glu Thr Cys Thr Met Val Gln Leu Met 50 55 60 Gln Glu Val Pro Arg Glu Arg Met Phe His Glu Asp Tyr Pro Tyr Tyr 65 70 75 80 Ser Ser Gly Ser Ala Val Met Gln Lys His Phe Ala Asp Thr Ala Arg 85 90 95 Gln Leu Leu Glu Thr Glu Ala Thr Gly Pro Asp Pro Phe Val Val Glu 100 105 110 Ile Gly Cys Asn Asp Gly Val Met Leu Arg Thr Val His Glu Ala Gly 115 120 125 Val Arg His Leu Gly Phe Glu Pro Ser Gly Lys Val Ala Glu Ala Ala 130 135 140 Arg Ala Lys Gly Leu Arg Val Arg Gly Asp Phe Phe Glu Glu Ser Thr 145 150 155 160 Ala Arg Glu Val Arg Ala Ser Asp Gly Pro Ala Asp Val Ile Phe Ala 165 170 175 Ala Asn Thr Ile Cys His Ile Pro Tyr Leu Asp Ser Ile Leu Arg Gly 180 185 190 Val Asp Ala Leu Leu Gly Pro Asp Gly Val Phe Val Phe Glu Asp Pro 195 200 205 Tyr Leu Gly Asp Ile Leu Ala Lys Thr Ser Phe Asp Gln Ile Tyr Asp 210 215 220 Glu His Phe Phe Leu Phe Ser Ala Arg Ser Val Gln Ala Leu Ala Ala 225 230 235 240 Ser Phe Gly Phe Glu Leu Val Asp Val Asp Arg Leu Ala Val His Gly 245 250 255 Gly Glu Val Arg Tyr Thr Leu Ala Arg Ala Gly Ala Arg Arg Pro Ala 260 265 270 Asp Arg Val Ala Ala Leu Ile Ala Glu Glu Asp Ala Gly Gly Val Ala 275 280 285 Thr Leu Ala Arg Leu Asp Gln Phe Ala Ala Gln Val Gly Arg Ile Arg 290 295 300 Asp Asp Leu Arg Ala Leu Leu Glu Arg Leu Thr Ala Glu Gly Lys Arg 305 310 315 320 Val Val Ala Tyr Gly Ala Thr Ala Lys Ser Ala Thr Val Ala Asn Phe 325 330 335 Cys Gly Ile Gly Pro Asp Leu Val Ser Arg Val Tyr Asp Thr Thr Pro 340 345 350 Ala Lys Gln Gly Arg Leu Thr Pro Gly Thr His Ile Pro Val His Ala 355 360 365 Ala Asp Glu Phe Pro Thr Asp Pro Pro Asp Tyr Ala Leu Leu Phe Ala 370 375 380 Trp Asn His Ala Asp Glu Ile Met Ala Lys Glu Gln Ala Phe Arg Gln 385 390 395 400 Ala Gly Gly Ala Trp Ile Leu Tyr Val Pro His Val His Val Arg Asp 405 410 415 <210> SEQ ID NO 53 <211> LENGTH: 207 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 53 Val Gln Val Ala Thr Glu Leu Ala Val Glu Gly Ala Tyr Val Phe Thr 1 5 10 15 Pro Arg Val Phe Pro Asp Pro Arg Gly Val Phe Val Ser Pro Tyr Leu 20 25 30 Asp Ser Val Phe Thr Glu Thr Leu Gly Tyr Pro Leu Phe Pro Val Ala 35 40 45 Gln Thr Ser Tyr Ser Val Ser Arg Arg Gly Val Val Arg Gly Leu His 50 55 60 Tyr Thr Thr Thr Pro Pro Gly Ser Ala Lys Phe Val Ser Cys Pro Tyr 65 70 75 80 Gly Arg Val Leu Asp Val Val Leu Asp Val Arg Val Gly Ser Pro Thr 85 90 95 Phe Gly Arg Trp Asp Ser Val Val Leu Asp Ser Gln Gly Phe Arg Ser 100 105 110 Leu Tyr Leu Pro Thr Gly Val Ala His Met Phe Val Ala Leu Met Asp 115 120 125 Asp Thr Val Met Ser Tyr Leu Leu Ser Thr Glu Tyr Val Phe Glu Asn 130 135 140 Glu Arg Ala Leu Ser Pro Leu Asp Asp Thr Leu Gly Leu Pro Val Pro 145 150 155 160 Ala Asp Ile Glu Pro Ile Leu Ser Asp Arg Asp Arg Thr Ala Ile Thr 165 170 175 Phe Ala Gln Ala His Ala Ala Gly Val Leu Pro Arg Tyr Glu Ile Cys 180 185 190 Ala Glu Ile Glu Ala Arg Phe Ala Gln Gly Pro His Arg Thr Ala 195 200 205 <210> SEQ ID NO 54 <211> LENGTH: 343 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 54 Met Thr Asp Arg Pro Leu Ile Ala Val Leu Gly Ala Ser Gly Phe Val 1 5 10 15 Gly Ser Ala Val Leu Thr Ala Leu Ala Asp Arg Pro Val Thr Val Arg 20 25 30 Ala Val Ser Arg Arg Ser Ala Val Ala Pro Glu Pro Ala Ala Ala Asp 35 40 45 Phe Glu Val Val Thr Ala Asp Leu Thr Glu Thr Gly Ala Val Ala Ala 50 55 60 Ala Val Glu Gly Ala Asp Ala Val Ile Asn Leu Val Leu Asn Thr Ala 65 70 75 80 Gly Trp Arg Ser Ala Asp Gly Asp Gly Thr Ala Ala Arg Val Asn Leu 85 90 95 Gly Val Val Arg Asp Leu Val Glu Val Ala Arg Ala Gly Thr Gly Pro 100 105 110 Arg Val Val Val Phe Ala Gly Ser Ala Ser Gln Val Gly Arg Ala Gln 115 120 125 Arg Met Pro Val Asp Gly Thr Glu Pro Asp His Pro Glu Thr Gly Tyr 130 135 140 Asp Arg Gln Lys Ala Ala Ala Glu Ala Leu Leu Asp Arg Ala Ser Ala 145 150 155 160 Asp Gly Ile Leu Arg Gly Val Thr Leu Arg Leu Pro Thr Val Phe Gly 165 170 175 Pro Ala Arg Pro Gly Gly Gly Asp Asp Arg Gly Val Val Ser Thr Met 180 185 190 Ile Arg Arg Ala Leu Ala Gly Glu Pro Leu Thr Met Trp His Asp Gly 195 200 205 Thr Ile Gln Arg Glu Leu Leu Tyr Val Asp Asp Ala Ala Ser Ala Phe 210 215 220 Val Ala Ala Leu Asp His Ala Asp Ala Leu Val Cys Arg His Trp Pro 225 230 235 240 Leu Gly Ser Arg Arg Gly Glu Pro Val Gly Asp Leu Phe Arg Thr Ile 245 250 255 Ala Ala Leu Val Ala Glu Glu Thr Gly Arg Pro Pro Val Pro Val Val 260 265 270 Ser Val Ala Pro Pro Ala Ser Ala Arg Gln Thr Asp Phe His Ser Leu 275 280 285 Val Val Asp Ala Ser Ala Phe Thr Ala Val Thr Gly Trp Arg Ala Gln 290 295 300 Val Asp Leu Leu His Gly Leu Arg Arg Thr Val Arg Ser Leu Ser Arg 305 310 315 320 Asn Ala Pro Arg Ser Arg Arg Arg Ser Arg Thr Gly Gly Ala Pro Arg 325 330 335 Pro Arg Gly Pro Gly Arg Arg 340 <210> SEQ ID NO 55 <211> LENGTH: 306 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 55 Met Ser Gln Thr Tyr Gly Arg Ser Pro Leu Trp Glu Leu Tyr Asn Asn 1 5 10 15 Thr Gln Val Thr Arg Glu Ala Val Gly Asp Leu Ser Asn Phe Lys Ser 20 25 30 Ser Glu Val Asn Tyr Lys Leu Ala Leu Trp Asp Pro Arg Val Asn Gly 35 40 45 Val Arg Tyr Leu Lys Thr Leu Val Phe Thr Leu Ala Ala Gly Leu Ser 50 55 60 Pro Ala Asn Trp Ala Arg Leu Arg Arg Ile Ala Asn Arg Glu Val Gly 65 70 75 80 Asp Pro Phe Ser Ile Thr Tyr Asp Gly Glu Ala Val Cys Met Asp Tyr 85 90 95 Leu Gln Ala Val Leu Glu Val Glu Phe Ile Glu Ser Arg Met Thr Leu 100 105 110 Asp Gly Thr Ser Ile Leu Glu Ile Gly Ala Gly Tyr Gly Arg Thr Cys 115 120 125 His Ala Leu Leu Ser Asn His Glu Ile Ala Ala Tyr His Ile Val Asp 130 135 140 Leu Glu Asn Ser Leu Asp Leu Ala Ser Arg Tyr Leu Gly Ala Val Leu 145 150 155 160 Thr Asp Glu Gln Leu Ala Lys Val His Phe His Gly Val Asp Gln Ala 165 170 175 Glu Ala Gly Gly Ala Leu Arg Glu Leu Arg Phe Asp Leu Ala Ile Asn 180 185 190 Ile Asp Ser Phe Ala Glu Met Thr Pro Asp Thr Val Gly Ala Tyr Leu 195 200 205 Asp Leu Ile Ala Thr His Ala Asp His Leu Tyr Val Asn Asn Pro Val 210 215 220 Gly Lys Tyr Leu Asp Lys Ser Leu Asp Gly His Ser Gln Gly Asp Ala 225 230 235 240 Val Val Gln Leu Ala Leu Arg Thr Gly Leu Leu Arg Asp Ile Val Asp 245 250 255 Ile Phe Asp Asp Arg Ala Val Ala Ala Gln Ser Arg Arg Phe Ile Asp 260 265 270 Ala Tyr Arg Pro Gly Arg Asp Trp Ala Leu Leu Ala Asp Ala Arg Ala 275 280 285 Val Pro Trp Ser Phe Tyr Trp Gln Ala Leu Tyr Arg Ser Gly Ala Ala 290 295 300 Gly Arg 305 <210> SEQ ID NO 56 <211> LENGTH: 518 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 56 Met Arg His Arg His Ala Leu Leu Ala Val Gly Thr Thr Ala Thr Leu 1 5 10 15 Val Ala Ala Gly Leu Ala Gly Leu Thr Phe Pro Ala Ser Ala Ala Ala 20 25 30 Thr Gly Cys Ser Val Ala Tyr Thr Val Gln Ser Gln Trp Thr Gly Gly 35 40 45 Phe Ser Gly Asn Val Ala Ile Thr Asn Leu Gly Ser Ala Leu Thr Gly 50 55 60 Trp Thr Leu Thr Phe Asp Phe Pro Thr Ser Gly Gln Gln Val Thr Gln 65 70 75 80 Gly Trp Ser Ala Thr Trp Ser Gln Ser Gly Thr Ser Val Ser Ala Ala 85 90 95 Ser Leu Ser Trp Asn Gly Ser Leu Gly Thr Gly Gly Ser Thr Thr Ile 100 105 110 Gly Phe Asn Gly Ser Trp Ser Gly Ser Asn Pro Val Pro Lys Ser Phe 115 120 125 Ala Leu Asn Gly Thr Thr Cys Thr Gly Ser Val Thr Ser Pro Thr Pro 130 135 140 Glu Pro Thr Thr Thr Pro Pro Pro Thr Thr Pro Pro Pro Thr Thr Pro 145 150 155 160 Pro Pro Thr Thr Pro Pro Pro Thr Thr Pro Pro Pro Thr Thr Pro Pro 165 170 175 Pro Thr Gly Ala Ala Pro Ala Leu Lys Val Ser Gly Asn Arg Leu Val 180 185 190 Thr Ala Ser Gly Ala Thr Tyr Arg Leu Leu Gly Val Asn Arg Ala Ser 195 200 205 Gly Glu Phe Ala Cys Val Gln Gly Lys Gly Met Trp Asp Ser Gly Pro 210 215 220 Val Asp Gln Ala Ser Val Asn Ala Met Lys Ala Trp Asn Ile Arg Ala 225 230 235 240 Val Arg Ile Pro Leu Asn Glu Asp Cys Trp Leu Gly Leu Ser Gly Ser 245 250 255 Pro Ser Gly Ala Thr Tyr Gln Gln Ala Val Lys Asp Tyr Val Asn Leu 260 265 270 Leu Val Ala Asn Gly Ile Asn Pro Ile Leu Asp Leu His Trp Thr His 275 280 285 Gly Gln Tyr Thr Gly Asn Ile Ser Ala Cys Ala Asp Val Asn Ala Thr 290 295 300 Cys Gln Lys Pro Met Pro Ser Met Gln His Thr Pro Gln Phe Trp Thr 305 310 315 320 Gly Val Ala Asn Ala Phe Lys Gly Asn Asp Ala Val Val Phe Asp Leu 325 330 335 Phe Asn Glu Pro Tyr Pro Asp Ala Ala Asn Asn Trp Ser Asp Met Ala 340 345 350 Ala Ala Trp Arg Cys Leu Arg Asp Gly Gly Thr Cys Thr Gly Ile Thr 355 360 365 Tyr Glu Val Ala Gly Met Gln Asp Leu Val Asp Ala Val Arg Ala Thr 370 375 380 Gly Ala Ser Asn Val Leu Leu Val Ala Gly Leu Thr Trp Thr Asn Asp 385 390 395 400 Leu Ser Gln Trp Leu Thr Tyr Lys Pro Asn Asp Pro Leu Gly Asn Ile 405 410 415 Val Ala Ser Trp His Ser Tyr Asn Phe Asn Ala Cys Val Thr Arg Leu 420 425 430 Leu Leu Gly Gln Pro Asp Arg Arg Arg Arg Arg Arg Arg Cys Pro Val 435 440 445 His Ala Gly Glu Ile Gly Gln Asp Thr Cys Ala His Asp Tyr Ile Asp 450 455 460 Gln Val Met Thr Leu Ala Gly Leu Gln Ala Asp Arg Leu His Gly Val 465 470 475 480 Thr Trp Asn Pro Trp Gly Cys Ser Gly Gly Asn Val Leu Ile Gln Asp 485 490 495 Tyr Asn Gly Thr Pro Thr Ser Thr Tyr Gly Glu Gly Tyr Lys Ala His 500 505 510 Leu Leu Ser Val Thr Pro 515 <210> SEQ ID NO 57 <211> LENGTH: 286 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 57 Met Ser Thr Thr Ser Ala Val Gly Leu Val Leu Ala Arg Ala Pro Arg 1 5 10 15 Leu Leu Gly Ala Glu Pro Phe Phe Met Glu Phe Ile Ser Gly Ile Glu 20 25 30 Glu Arg Leu Ala Glu His Gly Arg Ser Val Leu Leu His Ile Val Ala 35 40 45 Asp His Ala Ala Glu Ile Ala Ala Tyr Arg Arg Trp Ala Gln Leu Arg 50 55 60 Leu Ala Glu Ala Val Val Leu Val Asn Pro Thr Ala Ala Asp Pro Arg 65 70 75 80 Pro Ala Val Leu Arg Asp Leu Gly Leu Pro Val Val Val Ala Gly Glu 85 90 95 Pro Ala Gly Asp Thr Pro Ala Val Arg Arg Asp Asp Val Gly Ser Val 100 105 110 Arg Ala Ala Val Ala His Leu Ala Gly Leu Gly His Arg Arg Ile Ala 115 120 125 Arg Ile Ser Gly Pro Asp Ser Leu Arg His Thr Arg Thr Arg Thr Ala 130 135 140 Ala Leu Leu Ala Ala Ala Ala Pro Ala Gly Ile Asp Ala Val Val Leu 145 150 155 160 Thr Gly Asp Tyr Ser Glu Glu Ser Gly Ala Ala Ala Thr Val Arg Leu 165 170 175 Leu Arg Asp Gly Asp Pro Pro Ser Ala Ile Ile Tyr Asp Asn Asp Leu 180 185 190 Met Ala Val Gly Gly Leu Glu Val Ala Arg Glu Leu Gly Leu Ala Val 195 200 205 Pro Arg Asp Leu Ser Met Leu Ala Trp Asp Asp Ser Ser Leu Cys Arg 210 215 220 Leu Ser Ser Pro Gln Leu Thr Thr Met Ser Leu Asp Val His Glu Phe 225 230 235 240 Gly Ala Leu Val Ala Ala Ser Val Leu Ala Leu Leu Asp Gly Glu Pro 245 250 255 Val Arg Glu Arg Trp Cys Pro Thr Glu Thr Val Ile Ala Arg His Thr 260 265 270 Thr Gly Pro Ala Pro Ala Gly Lys Gln Gly Arg Thr Ser Ala 275 280 285 <210> SEQ ID NO 58 <211> LENGTH: 340 <212> TYPE: PRT <213> ORGANISM: M. carbonacea <400> SEQUENCE: 58 Val Gly Val Cys Leu Gly Gln Gly Val Met Thr Tyr Val Leu Thr Asp 1 5 10 15 Leu Thr Gly Ile Val Val Ala Arg Ile Ser Arg Pro Gly Val Gly Val 20 25 30 Glu Ala Pro Ala Thr Val Val Ser Arg Ile Ala Ala Glu Ile Pro Thr 35 40 45 Leu Val Asp Ser Val Gly Val Asp Arg Ala Arg Leu Val Gly Val Gly 50 55 60 Leu Val Phe Pro Gly Pro Leu Thr Gly Arg Gly Val Ala Gly Leu Asn 65 70 75 80 Pro Glu Leu Arg His Trp Arg Glu Phe Pro Leu Gly Ala Ala Leu Glu 85 90 95 Gln Ala Thr Glu Leu Pro Val Val Leu Asp Asn Asp Ala Thr Ala Ala 100 105 110 Ala Leu Gly Glu His Trp Ala Gly Gly Phe Gly Thr Ala Ser Ala Ala 115 120 125 Ala Ala Leu Tyr Met Gly Ser Gly Leu Gly Ala Gly Leu Val Ile Asp 130 135 140 Gly Ile Thr Tyr Arg Gly Pro Ser Gly Asn Ala Gly Glu Leu Gly His 145 150 155 160 Val Cys Val Ala Ala Asp Gly Pro Pro Cys Trp Cys Gly Ala Arg Gly 165 170 175 Cys Val Glu Ala Val Ala Gly Pro Ala Ala Val Val Ala Ala Gly Arg 180 185 190 Ala Asp Ala Gly Leu Ala Arg Ala Leu Gly Leu Thr Thr Gly Ala Gly 195 200 205 Pro Ala Ala Val Ala Arg Asp Phe Ala Ala Ile Gly Arg Ala Ala Arg 210 215 220 Arg Gly Glu Gln Arg Ala Leu Ala Leu Cys Glu Arg Ser Ala Arg Tyr 225 230 235 240 Val Ala Ala Ala Ala Arg Thr Leu Ala Asn Val Met Asp Leu Glu Val 245 250 255 Ile Val Leu Thr Gly Pro Gly Phe Ala Val Ala Gly Ser Val Tyr Leu 260 265 270 Pro Val Leu Arg Gln Glu Leu His Thr Val Phe Ser Arg Ser Ala His 275 280 285 Pro Val Arg Arg Thr Ala Val Pro Val Arg Arg His Arg Val Ala Val 290 295 300 Gly Ala Ala Ala Leu Val Leu Glu Ser Glu Leu Val Pro Phe His Arg 305 310 315 320 Gly Leu Arg Val Ser Glu Ser Leu Asp Gly Glu Phe Ala Pro Leu Pro 325 330 335 Gly Ser Ala Arg 340 

1. An isolated nucleic acid molecule comprising a nucleic acid sequence selected from any of: (a) a nucleic acid encoding any of everninomicin open reading frames (ORFs) 1 to 49 (SEQ ID NOS: 2, 5 to 7, 9 to 21, 23 to 35, 37 to 46, 48, and 50 to 58); (b) a nucleic acid encoding a polypeptide encoded by any of everninomicin open reading frames (ORFS) 1 to 49 (SEQ ID NOS: 2, 5 to 7, 9 to 21, 23 to 35, 37 to 46, 48, and 50 to 58); and (c) a nucleic acid encoding a polypeptide which is at least 75% identical in amino acid sequence to a polypeptide encoded by any of everninomicin open reading frames (ORFs) 1 to 49 (SEQ ID NOS: 2, 5 to 7, 9 to 21, 23 to 35, 37 to 46, 48, and 50 to 58).
 2. The isolated nucleic acid of claim 1, wherein said nucleic acid comprises a nucleic acid encoding at least two open reading frames (ORFs) selected from the group consisting of ORF 1 to ORF 49 (SEQ ID NOS: 2, 5 to 7, 9 to 21, 23 to 35, 37 to 46, 48, and 50 to 58).
 3. The isolated nucleic acid of claim 2, wherein said nucleic acid comprises a nucleic acid encoding at least three open reading frames (ORFs) selected from the group consisting of ORF 1 to ORF 49 (SEQ ID NOS: 2, 5 to 7, 9 to 21, 23 to 35, 37 to 46, 48, and 50 to 58).
 4. An isolated nucleic acid comprising a nucleic acid that hybridizes under stringent conditions to an open reading frame (ORF) of the everninomicin biosynthesis gene cluster and can substitute for the ORF to which it specifically hybridizes to direct the synthesis of an everninomicin.
 5. The isolated nucleic acid of claim 4, wherein the isolated nucleic acid specifically hybridizes under stringent conditions to a nucleic acid encoding a polypeptide selected from the group comprising of ORF 1, ORF 2, ORF 3, ORF 4, ORF 5, ORF 6, ORF 7, ORF 8, ORF 9, ORF 10, ORF 11, ORF 12, ORF 13, ORF 14, ORF 15, ORF 16, ORF 17, ORF 18, ORF 19, ORF 20, ORF 21, ORF 22, ORF 23 and ORF 24 (SEQ ID NOS: 2, 5 to 7, 9 to 21, and 23 to 29).
 6. The isolated nucleic acid of claim 4 wherein the nucleic acid specifically hybridizes under stringent conditions to a nucleic acid encoding a polypeptide selected from the group consisting of ORF 25, ORF 26, ORF 27, ORF 28, ORF 29, ORF 30, ORF 31, ORF 32, ORF 33, ORF 34, ORF 35, ORF 36, ORF 37, ORF 38, ORF 39, ORF 40, ORF 41, ORF 42, ORF 43, ORF 44, ORF 45, ORF 46, ORF 47, ORF 48 and ORF 49 (SEQ ID NOS 30 to 35, 37 to 46, 48 and 50 to 58).
 7. The isolated nucleic acid of claim 5 wherein the isolated nucleic acid encodes a polypeptide selected from the group consisting of ORF 1, ORF 2, ORF 3, ORF 4, ORF 5, ORF 6, ORF 7, ORF 8, ORF 9, ORF 10, ORF 11, ORF 12, ORF 13, ORF 14, ORF 15, ORF 16, ORF 17, ORF 18, ORF 19, ORF 20, ORF 21, ORF 22, ORF 23 and ORF 24 (SEQ ID NOS: 2, 5 to 7, 9 to 21, and 23 to 29).
 8. The isolated nucleic acid of claim 6 wherein the isolated nucleic acid encodes a polypeptide selected from the group consisting of ORF 25, ORF 26, ORF 27, ORF 28, ORF 29, ORF 30, ORF 31, ORF 32, ORF 33, ORF 34, ORF 35, ORF 36, ORF 37, ORF 38, ORF 39, ORF 40, ORF 41, ORF 42, ORF 43, ORF 44, ORF 45, ORF 46, ORF 47, ORF 48 and ORF 49 (SEQ ID NOS 30 to 35, 37 to 46, 48 and 50 to 58).
 9. An isolated gene cluster comprising open reading frames encoding polypeptides sufficient to direct the synthesis of an everninomicin or an everninomicin analogue.
 10. The isolated gene cluster of claim 9 wherein the gene cluster is present in a bacterium.
 11. The isolated gene cluster of claim 9 wherein the gene cluster is present in E. coli strains DH10B having accession nos. IDAC 240101-1, IDAC 240101-2 and IDAC 240101-3.
 12. An isolated polypeptide comprising a polypeptide sequence selected from any one of: a) a polypeptide of open reading frames 1 to 49 (SEQ ID NOS: 2, 5 to 7, 9 to 21, 23 to 35, 37 to 46, 48, and 50 to 58); and b) a polypeptide which is at least 75% identical in amino acid sequence to a polypeptide sequence of open reading frames (ORFs) 1 to 49 (SEQ ID NOS: 2, 5 to 7, 9 to 21, 23 to 35, 37 to 46, 48, and 50 to 58).
 13. The polypeptide of claim 12, wherein said polypeptide is a polypeptide containing at least two open reading frames selected from open reading frames (ORFs)1 to 49 (SEQ ID NOS: 2, 5 to 7, 9 to 21, 23 to 35, 37 to 46, 48, and 50 to 58).
 14. The polypeptide of claim 12, wherein said polypeptide is a polypeptide containing at least three open reading frames selected from open reading frames (ORFs) 1 to 49 (SEQ ID NOS: 2, 5 to 7, 9 to 21, 23 to 35, 37 to 46, 48, and 50 to 58).
 15. The polypeptide of claim 12, wherein said polypeptide is a polypeptide containing at least three or more open reading frames selected from open reading frames 1 to 49 (SEQ ID NOS: 2, 5 to 7, 9 to 21, 23 to 35, 37 to 46, 48, and 50 to 58).
 16. An expression vector comprising a nucleic acid of claim
 1. 17. A host cell transformed with an expression vector of claim
 16. 18. The host cell of claim 17, wherein the cell is transformed with an exogenous nucleic acid comprising a gene cluster encoding polypeptides sufficient to direct the assembly of an everninomicin or an everninomicin analogue.
 19. A method of chemically modifying a biological molecule, said method comprising contacting a biological molecule that is a substrate for a polypeptide encoded by an everninomicin biosynthesis gene cluster open reading frame with a polypeptide encoded by an everninomicin biosynthesis gene cluster open reading frame whereby said polypeptide chemically modifies said biological molecule.
 20. The method of claim 19 wherein said method comprises contacting said biological molecule with at least two different polypeptides encoded by everninomicin biosynthesis gene cluster open reading frames 1 to 49 (SEQ ID NOS: 2, 5 to 7, 9 to 21, 23 to 35, 37 to 46, 48, and 50 to 58). 